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Abstract 

In  many  applications  in  science  and  engineering  one  must  rely  on  coarsely  quantized  and 
often  unreliable  noisy  measurements  in  order  to  accurately  and  reliably  estimate  quantities 
of  interest.  This  scenario  arises,  for  instance,  in  distributed  wireless  sensor  networks  where 
measurements  made  at  remote  sensors  need  to  be  fused  at  a  host  site  in  order  to  decipher 
an  information-bearing  signal.  Resources  such  as  bandwidth,  power,  and  hardware  are 
usually  limited  and  shared  across  the  network.  Consequently,  each  sensor  may  be  severely 
constrained  in  the  amount  of  information  it  can  communicate  to  the  host  and  the  complexity 
of  the  processing  it  can  perform. 

In  this  thesis,  we  develop  a  versatile  framework  for  designing  low-complexity  algorithms 
for  efficient  digital  encoding  of  the  measurements  at  each  sensor,  and  for  accurate  signal  es¬ 
timation  from  these  encodings  at  the  host.  We  show  that  the  use  of  a  properly  designed  and 
often  easily  implemented  control  input  added  prior  to  signal  quantization  can  significantly 
enhance  overall  system  performance.  In  particular,  efficient  estimators  can  be  constructed 
and  used  with  optimized  pseudo-noise,  deterministic,  and  feedback-based  control  inputs, 
resulting  in  a  hierarchy  of  practical  systems  with  very  attractive  performance-complexity 
characteristics. 

Thesis  Supervisor:  Gregory  W.  Wornell 

Title:  Cecil  and  Ida  Green  Associate  Professor  of  Electrical  Engineering 
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Chapter  1 


Introduction 


There  is  a  wide  range  of  applications  in  science  and  engineering  where  we  wish  to  deci¬ 
pher  signals  from  noisy  measurements,  and  where  system  constraints  force  us  to  rely  on  a 
quantized  or  coarse  description  of  those  measurements.  Representative  examples  include 
analog-to-digital  (A/D)  conversion,  lossy  compression,  and  decentralized  data  fusion.  In¬ 
deed,  in  many  data  fusion  problems  the  available  resources  place  constraints  in  the  type  and 
the  amount  of  data  that  can  be  exploited  at  the  fusion  center.  Data  fusion  problems  arise 
in  a  very  broad  and  diverse  range  of  applications,  including  distributed  sensing  for  military 
applications  [8],  data- based  management  systems  [2],  target  tracking  and  surveillance  for 
robot  navigation  [22,  28]  and  radar  applications  [35],  and  medical  imaging  [9]. 

Recently,  data  fusion  has  attracted  considerable  attention  in  the  context  of  distributed 
sensing  problems,  due  to  the  continuing  reduction  in  the  cost  of  sensors  and  computation, 
and  the  performance  improvements  that  inherently  emanate  from  the  use  of  multiple  sen¬ 
sors  [33].  Unlike  classical  multi-sensor  fusion  where  the  data  collected  by  the  sensors  are 
communicated  in  full  to  a  central  processor,  it  is  often  desirable  to  perform  some  form  of 
decentralized  processing  at  the  sensor  before  communicating  the  acquired  information  to 
the  central  processor  in  a  condensed  and  often  lossy  form. 

Various  challenging  signal  detection  and  estimation  problems  have  surfaced  in  such  dis¬ 
tributed  sensing  applications.  Naturally,  it  is  important  to  determine  the  extent  to  which  de¬ 
centralized  preprocessing  limits  performance  and  to  develop  effective  low-complexity  meth¬ 
ods  for  performing  decentralized  data  fusion.  As  Hall  et  al.  [19]  show  in  the  context  of 
decentralized  estimation,  depending  on  the  particular  scenario,  distributed  data  processing 
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may  range  from  being  optimal,  in  the  sense  that  no  loss  in  performance  is  incurred  by  sim¬ 
ply  communicating  the  local  estimates  computed  at  each  sensor,  to  being  catastrophic,  in 
the  sense  that  preprocessing  at  each  sensor  can  completely  destroy  the  underlying  structure 
in  the  joint  set  of  sensor  measurements.  Similar  performance  characteristics  are  exhibited 
in  decentralized  signal  detection  problems  [5,  30].  Although  for  many  important  cases  of 
practical  interest  decentralized  signal  detection  and  estimation  methods  have  been  formed 
for  locally  optimized  processing  at  each  sensor  and  subsequent  efficient  data  fusion  at  the 
host  (see  [10,  34,  6,  30,  19,  11,  7,  13,  24]  and  the  references  therein),  a  number  of  real-time 
decentralized  fusion  problems  are  still  largely  unexplored. 

In  this  thesis  we  focus  on  an  important  real-time  decentralized  fusion  problem  that  arises 
in  networks  of  distributed  wireless  sensors  used  for  collecting  macroscopic  measurements. 
In  particular,  such  networks  are  naturally  suited  for  monitoring  temporal  variations  in  the 
average  levels  of  environmental  parameters.  Representative  examples  include  monitoring 
concentration  levels  in  the  atmosphere  for  detecting  chemical  or  biological  hazards,  and 
measuring  temperature  fluctuations  in  the  ocean  surface  for  weather  forecasting  applica¬ 
tions. 

A  block  diagram  of  such  a  wireless  sensor  network  is  depicted  in  Fig.  1-1.  In  such  a 
network,  the  local  measurements  made  at  each  sensor  must  be  communicated  with  minimal 
delay  to  a  host  over  a  wireless  channel,  where  they  must  be  effectively  combined  to  decipher 
the  information-bearing  signal.  Since  bandwidth  must  often  be  shared  across  such  a  sensor 
network,  the  effective  data  rate  at  which  each  sensor  can  reliably  communicate  to  the 
host  over  the  wireless  channel  may  be  severely  limited,  often,  to  a  few  bits  of  information 
per  each  acquired  sensor  measurement.  The  need  for  power  efficient  design  may  also  place 
constraints  in  the  available  processing  complexity  at  each  sensor,  but  usually  not  at  the  host, 
which  typically  possesses  more  processing  power  than  each  individual  sensor.  Depending 
upon  bandwidth  availability  in  these  wireless  networks,  the  host  may  or  may  not  broadcast 
information  back  to  the  remote  sensors,  so  as  to  improve  the  quality  of  the  future  sensor 
data  it  receives. 

This  type  of  problem  may  also  arise  in  networks  that  are  not  wireless,  but  where  the 
sensors  are  intrinsically  limited  by  design.  For  instance,  concentrations  of  chemical  or  bi¬ 
ological  agents  are  often  computed  by  observing  the  color  or  the  conformation  of  certain 
indicator/sensor  molecules.  In  many  of  these  cases,  these  sensor  molecules  exhibit  only  a 
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Figure  1-1:  Block  diagram  of  a  wireless  sensor  network  with  bandwidth  and  power  con¬ 
straints. 

finite  set  of  possible  outputs.  In  addition,  there  is  often  very  limited  flexibility  in  terms  of 
affecting  or  biasing  future  outputs  exhibited  by  these  indicator  molecules.  Such  networks  of 
resolution-limited  sensors  are  also  employed  by  a  number  of  biological  systems  for  perform¬ 
ing  vital  sensory  tasks,  suggesting  that  the  type  of  processing  performed  by  these  systems 
somehow  corresponds  to  an  efficient  use  of  resources  [15,  16,  23].  For  instance,  it  has  been 
conjectured  that  certain  types  of  crayfish  enhance  the  ability  of  their  crude  sensory  neurons 
to  reliably  detect  weak  signals  sent  by  their  predators  by  exploiting  remarkably  simple  and, 
at  first  sight,  counterintuitive  pre-processing  [16]. 

Various  types  of  data  fusion  problems  of  the  form  depicted  in  Fig.  1-1  have  been  exam¬ 
ined;  in  particular,  the  limitations  in  the  amount  of  information  that  each  sensor  can  com¬ 
municate  to  the  host  are  present  in  a  number  of  decentralized  detection  problems  [10,  32]. 
Another  example  that  fits  the  same  framework  is  what  is  referred  to  as  the  “CEO  problem” 
[4],  where  a  number  of  agents  obtain  noisy  observations  of  a  signal  of  interest  and  have  to 
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(b)  Estimation  of  information-bearing  signal  at  the  host  from  the  encoded  data  streams 


Figure  1-2:  Framework  for  signal  estimation  from  noisy  measurements  in  sensor  networks. 


communicate  this  information  to  a  CEO  who  can  at  most  absorb  R  bits  of  information  per 
second. 

In  this  thesis,  we  focus  on  the  problem  of  signal  estimation  in  the  context  of  sensor 
networks  of  the  form  depicted  in  Fig.  1-1,  where  system  constraints  limit  the  amount  of 
information  that  each  sensor  can  communicate  to  the  host,  and  where  there  may  also  exist 
constraints  in  the  amount  of  computation  available  at  each  sensor.  It  is  very  convenient  to 
decompose  this  problem  in  the  two  stages  shown  in  Fig.  1-2.  First,  as  depicted  in  Fig.  1- 
2(a),  at  each  sensor,  the  acquired  noisy  measurements  of  the  information-bearing  signal 
must  be  encoded  into  an  efficient  digital  representation.  Then,  as  shown  in  Fig.  l-2(b),  the 
data  streams  from  all  sensors  are  to  be  effectively  combined  at  the  host  in  order  to  obtain 
an  accurate  signal  estimate. 

As  we  might  expect,  these  two  design  stages  of  data  encoding  and  signal  estimation  are 
very  closely  coupled.  At  each  sensor,  the  measurements  have  to  be  efficiently  encoded  so 
as  to  enable  the  host  to  obtain  an  accurate  signal  estimate.  Conversely,  the  host  should 
exploit  all  the  available  information  about  the  encoding  strategy,  and,  in  particular,  when 
feedback  is  available,  it  may  broadcast  feedback  information  to  the  sensors  so  as  to  improve 
the  quality  of  the  future  sensor  encodings  it  receives.  As  we  demonstrate  in  this  thesis,  the 
performance  of  the  overall  system  strongly  depends  on  the  type  of  processing  complexity 
constraints  that  are  present  at  the  sensor  for  encoding  the  sensor  measurements. 


1.1  Outline  of  the  Thesis 


In  this  thesis  we  develop  a  framework  for  designing  computationally  efficient  algorithms 
for  effective  digital  encoding  of  the  measurements  at  each  sensor,  and  for  accurate  signal 
estimation  from  these  encodings  at  the  host.  In  Chapters  3-5  we  focus  on  the  case  where 
the  information-bearing  signal  varies  slowly  enough  that  we  may  view  it  as  static  over  the 
observation  interval.  We  begin  by  examining  in  detail  in  Chapter  2  a  class  of  low-complexity 
algorithms  for  encoding  noisy  measurements  collected  from  a  single  sensor  in  the  static  case. 
Specifically,  we  consider  encodings  of  the  form  of  a  suitably  designed  control  input  added 
prior  to  signal  quantization.  Depending  on  the  amount  of  information  that  the  estimator  can 
exploit  about  the  control  input  and  the  limitations  in  processing  complexity  at  the  encoder, 
a  number  of  key  encoding  strategies  and  associated  estimator  structures  are  presented.  For 
a  number  of  scenarios  of  practical  interest,  we  develop  host  efficient  estimators  that  can  be 
used  with  optimized  control  inputs  at  the  sensor,  resulting  in  a  hierarchy  of  systems  with 
very  attractive  performance-complexity  characteristics. 

In  Chapter  3,  we  develop  a  number  of  important  extensions  of  the  systems  developed 
in  Chapter  2  which  can  be  useful  in  the  context  of  networks  of  sensors.  We  first  develop 
optimized  multi-sensor  extensions  of  the  single-sensor  encoding  and  estimation  strategies 
for  all  the  scenarios  considered  in  Chapter  2.  These  systems  have  a  number  of  important 
potential  applications,  especially  in  the  context  of  distributed  sensing  networks  where  there 
are  physical  limitations  in  the  sensor  design,  or  bandwidth  and  power  constraints.  We  also 
develop  extensions  of  these  encoders  and  estimators  for  scenarios  where  prior  information 
about  the  information-bearing  signal  is  available.  In  addition,  we  consider  the  case  where 
the  sensor  noise  power  level  is  also  unknown,  and  develop  the  performance  limits  and  the 
associated  extensions  of  the  encoders  and  estimators  for  all  the  scenarios  considered  in 
Chapter  2. 

In  Chapter  4,  we  consider  more  general  encoding  strategies  for  the  static  signal  case. 
These  encoders  require  more  complex  processing  than  the  encoders  employing  quantizer  bias 
control  of  Chapters  2-3  and  are  therefore  attractive  when  there  are  less  stringent  complexity 
constraints  at  the  encoder.  As  we  demonstrate,  we  can  develop  refinable  encoding  and 
estimation  strategies  which  asymptotically  achieve  the  best  possible  performance  based  on 
the  original  sensor  measurements.  In  this  sense,  we  show  that  using  a  suitably  designed 
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digitized  description  of  the  acquired  noisy  measurements  does  not  incur  any  performance 
loss  in  signal  estimation. 

In  Chapter  5,  we  consider  a  number  of  extensions  of  the  static-case  encoding  strategies 
which  encompass  a  broad  class  of  time-varying  signals.  In  particular,  in  the  case  that 
the  same  time-varying  signal  is  observed  at  each  sensor,  a  rich  class  of  algorithms  can 
be  designed  for  measurement  encoding  by  exploiting  the  encoding  principles  for  the  static 
problem.  In  fact,  these  methods  can  be  applied  in  the  context  of  a  large  class  of  signal 
models,  namely,  signals  that  can  be  characterized  by  conventional  state-space  models.  As 
we  also  show,  for  such  information-bearing  signals  we  can  also  design  effective  estimation 
algorithms  which  are  based  on  extensions  of  conventional  Kalman  filtering  solutions. 

Finally,  a  summary  of  the  main  contributions  of  this  thesis  is  given  in  Chapter  6,  along 
with  a  representative  collection  of  potentially  interesting  directions  for  future  research  that 


Chapter  2 


Encoding  from  Noisy 
Measurements  via  Quantizer  Bias 
Control:  Static  Case 


In  developing  methods  for  overcoming  the  power/bandwidth  constraints  that  may  arise 
across  a  sensor  network,  or  the  dynamic  range  and  resolution  constraints  at  each  sensor,  it 
is  instructive  to  first  examine  the  single-sensor  problem.  In  fact,  this  special  case  captures 
many  of  the  key  design  and  performance  issues  that  arise  in  the  context  of  networks  of 
sensors.  The  block  diagram  corresponding  to  a  single  sensor  is  shown  in  Fig.  2-1,  where 
A[n\  denotes  the  information-bearing  signal,  u[n]  represents  sensor  noise,  s[n]  denotes  the 
sensor  measurement  sequence,  and  y[n]  denotes  the  sequence  of  M- ary  symbols  encoded 
at  the  sensor  and  used  at  the  host  to  obtain  a  signal  estimate  A[ri\.  Consistent  with  the 
system  constraints,  throughout  the  thesis,  we  focus  on  developing  algorithms  that  generate 
encoded  sequences  whose  average  encoding  rate  does  not  exceed  one  M- ary  symbol  per 
available  sensor  measurement.  The  task  is  then  to  design  the  encoder  at  the  sensor  and 
the  associated  estimator  from  the  encodings  at  the  host  so  as  to  optimize  the  host  estimate 
quality. 

To  illustrate  some  of  the  key  issues  that  may  arise  in  the  encoder  design,  it  is  insightful 
to  consider  the  static  case,  i.e.,  the  case  where  the  signal  A[n]  is  varying  slowly  enough 
that  we  may  view  it  as  static  over  the  observation  interval.  Given  a  fixed  time  instant 
N ,  we  can  easily  devise  a  method  for  efficiently  encoding  the  N  sensor  measurements 
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Figure  2-1:  Block  diagram  of  encoding  the  noisy  measurements  at  the  sensor  and  signal 
estimation  from  these  encodings  at  the  host. 


s[l],  s[2],  •  •  •  ,  s[lV],  into  a  sequence  of  N  M- ary  symbols  y[l],  y[2],  •  •  •  ,  y[N]  provided  N 
is  large.  Specifically,  consider  the  following  algorithm: 

At  the  sensor: 

(?)  compute  an  estimate  of  the  static  information-bearing  signal  using  the  N  sensor 
measurements; 

(?'?')  quantize  the  estimate  using  a  uniform  quantizer  with  MN  quantization  levels; 

(?'?'?')  communicate  to  the  host  the  quantized  level  by  means  of  the  N  M- ary  symbols 

y[i],  y[2],  ■ ,  y[N]. 

At  the  host: 

(?)  reconstruct  the  “quantized”  estimate  using  y[l],  y[2],  •  •  •  ,  y[N]. 

Clearly,  since  the  number  of  available  quantization  levels  in  step  (?'?)  of  the  encoder  grows 
exponentially  with  the  number  of  available  observations  Ar,  the  error  between  the  “quan¬ 
tized”  estimate  used  at  the  host  and  the  original  sensor  estimate  produced  in  step  (?)  of  the 
encoder  (?.e.,  the  estimate  prior  to  quantization)  decays  exponentially  fast  with  N. 

A  major  disadvantage  of  such  an  encoding  scheme,  however,  is  that  it  is  not  refinable, 
namely  it  provides  an  one-shot  description;  no  encodings  are  available  to  the  host  for  forming 
estimates  before  time  N,  and  no  encodings  are  available  after  time  N  to  further  refine  the 
quality  of  the  host  estimate.  Furthermore,  this  encoding  scheme  assumes  that  there  is 
absolute  freedom  in  designing  the  M^-level  quantizer.  However,  this  is  often  not  the  case 
such  as  in  problems  where  the  sensors  are  intrinsically  limited  by  design.  For  these  reasons, 
in  this  thesis  we  instead  focus  on  designing  refinable  encoding  strategies. 

One  of  simplest  refinable  encoding  strategies  that  can  be  constructed  consists  of  quan¬ 
tizing  each  noisy  measurement  at  the  sensor  by  means  of  an  M-level  quantizer.  As  we 
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Figure  2-2:  Signal  estimation  based  on  digital  encodings  which  are  generated  by  adding  a 
suitably  designed  control  input  prior  to  signal  quantization. 


show  in  this  chapter,  however,  this  simple  encoding  scheme  can  have  very  poor  perfor¬ 
mance  characteristics,  in  terms  of  overcoming  the  power/bandwidth  constraints  across  the 
network,  or  the  dynamic  range  and  resolution  constraints  at  the  sensor.  As  a  means  for 
improving  the  effective  digital  encoding  we  may  consider  the  use  of  a  control  input  added 
to  the  information-bearing  signal  prior  to  quantization  at  the  sensor.  The  block  diagram 
corresponding  to  a  single  sensor  in  the  context  of  such  an  encoding  scheme  is  shown  in 
Fig.  2-2,  where  w[n]  is  a  control  input,  and,  as  in  Fig.  2-1,  A[n]  denotes  the  information¬ 
bearing  signal,  v[n\  represents  sensor  noise,  and  y[n]  denotes  the  quantized  signal  that  is 
sent  to  the  central  site. 

In  this  chapter  we  focus  on  the  static  case  of  the  estimation  problem  depicted  in  Fig.  2-2 
in  which  A\n\  =  A,  i.e.,  we  examine  the  problem  of  estimating  a  noise-corrupted  unknown 
parameter  A  via  quantized  observations.  This  case  reveals  several  key  features  of  signal 
estimation  from  quantized  observations  obtained  via  a  network  of  sensor  encoders,  each 
comprising  a  control  input  and  a  quantizer;  in  Chapter  5  we  develop  extensions  of  our 
analysis  corresponding  to  the  dynamic  scenario  where  A[n]  is  time-varying. 

Several  basic  variations  of  the  encoding  and  estimation  problem  depicted  in  Fig.  2-2  can 
arise  in  practice,  which  differ  in  the  amount  of  information  about  the  control  input  that  is 
available  for  estimation  and  the  associated  freedom  (or  available  encoding  complexity)  in  the 
control  input  selection.  In  this  chapter  we  develop  effective  control  input  selection  strategies 
and  associated  estimators  for  all  these  different  scenarios.  In  particular,  for  pseudo-noise 
control  inputs  whose  statistical  characterization  alone  is  exploited  at  the  receiver,  we  show 
that  there  is  an  optimal  power  level  for  minimizing  the  mean-square  estimation  error  (MSE). 
The  existence  of  a  non-zero  optimal  pseudo-noise  power  level  reveals  strong  connections  to 
the  phenomenon  of  stochastic  resonance,  which  is  encountered  in  a  number  of  physical 
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nonlinear  systems  where  thresholding  occurs  and  where  noise  is  often  exploited  for  signal 
enhancement  [3,  16,  18].  Performance  can  be  further  enhanced  if  detailed  knowledge  of  the 
applied  control  waveform  is  exploited  at  the  receiver.  In  this  scenario,  we  develop  methods 
for  judiciously  selecting  the  control  input  from  a  suitable  class  of  periodic  waveforms  for 
any  given  system.  Finally,  for  scenarios  where  feedback  from  the  quantized  output  to  the 
control  input  is  available,  we  show  that,  when  combined  with  suitably  designed  receivers, 
these  signal  quantizers  come  within  a  small  loss  of  the  quantizer-free  performance.1  In  the 
process  we  develop  a  framework  for  constructing  the  control  input  from  past  observations 
and  design  computationally  efficient  estimators  that  effectively  optimize  performance  in 
terms  of  MSE. 

The  outline  of  this  chapter  is  as  follows.  In  Section  2.1  we  describe  the  static-case  esti¬ 
mation  problem  associated  with  the  system  depicted  in  Fig.  2-2.  In  Section  2.2  we  develop 
the  estimation  performance  limits  for  a  number  of  important  scenarios.  In  Section  2.3  we 
design  control  inputs  and  associated  estimators  for  each  of  these  distinct  scenarios,  which 
achieve  the  performance  limits  developed  in  Section  2.2.  Finally,  in  Section  3.1  we  examine 
a  network  generalization  of  the  scenario  depicted  in  Fig.  2-2,  in  which  signal  estimation  is 
based  on  quantized  observations  collected  from  multiple  sensors. 

2.1  System  Model 

As  outlined  above,  in  this  chapter  we  consider  the  problem  of  estimating  an  unknown 
parameter  A  from  observation  of 

y[n ]  =  F(A  +  v[n]  +  «;[«])  n  =  1,  2,  •  •  •  ,  N  ,  (2.1) 

where  the  sensor  noise  u[n]  is  an  independent  identically  distributed  (IID)  process,  w?[n]  is 
a  control  input,  and  the  function  F(-)  is  an  M-level  quantizer,  with  the  quantized  output 

Although  the  feedback  loop  can  be  entirely  implemented  at  the  sensor,  sensor  complexity  is  reduced  by 
having  the  feedback  information  come  from  the  central  site.  This  is  especially  appealing  in  wireless  networks 
where  power  resources  at  the  centred  site  are  often  such  that  there  is  plenty  of  effective  bandwidth  available 
for  broadcasting  high-resolution  control  information. 


* *>■ 
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y[n]  taking  M  distinct  values  Yj,  I2,  •  •  •  ,  Ym,  *-e. 


Y{  if  X,_i  <  x  <  Xi,  for  2  <  i  <  M 
F(x)  =  {  ~  ~  ~  ,  (2.2a) 

I  Yi  otherwise 

where  Xq  =  —00  and  Xm  =  00.  Without  loss  of  generality,  we  assume  that  the  quantizer 
levels  are  uniformly  spaced,  i.e., 

Yi  =  -(M+l)  +  2i,  i  =  1,  2,  •  •  •  ,  M  ;  (2.2b) 

any  other  set  of  distinct  quantization  levels  is  equivalent  to  (2.2b)  in  the  sense  that  the  two 
sets  are  related  by  means  of  an  invertible  transformation.  We  also  define  the  intermediate 
sequence 


s[n]  =  A  +  u[n]  =  A  +  av  u[n] 


(2.3) 


We  will  frequently  be  interested  in  a  measure  of  predicted  performance  for  a  family 
of  sensor  noises  parameterized  by  av  in  (2.3),  arising  from  scaling  an  IID  noise  sequence 
u[n].  We  use  the  notation  pz  (•)  to  denote  the  probability  density  function  (PDF)  of  any 
sample  of  an  IID  sequence  z[n],  and  C2  (•)  to  denote  one  minus  the  corresponding  cumulative 
distribution,  i.e., 


Cz 


Pz  {t)  dt  . 


We  shall  refer  to  an  IID  noise  process  as  admissible  if  the  associated  PDF  is  non-zero  and 
smooth  (i.e.,  C1)  almost  everywhere.  Throughout  this  chapter,  we  assume  that  all  noise 
processes  are  admissible,  including  u[n]  as  well  as  tu[n],  when  w[n]  is  viewed  as  a  pseudo¬ 
noise  process.  Furthermore,  when  referring  to  a  Gaussian  process  we  assume  it  is  IID  and 
zero- mean,  unless  we  specify  otherwise. 
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2.2  Performance  Limits  for  Controllers  with  Quantizer  Bias 
Control 


In  this  section  we  quantify  the  performance  degradation  that  results  from  estimating  A 
based  on  observation  of  y[n]  instead  of  s[n].  We  first  introduce  the  concept  of  information 
loss,  which  we  use  as  a  figure  of  merit  to  design  quantizer  systems  and  evaluate  the  as¬ 
sociated  estimators.  We  then  present  a  brief  preview  of  performance  limits  based  on  this 
notion  for  a  number  of  important  scenarios  and  finally  develop  these  performance  limits  in 
Sections  2.2.1-2.2.3. 

We  define  the  information  loss  for  a  quantizer  system  as  the  ratio  of  the  Cramer-Rao 
bounds  for  unbiased  estimates  of  the  parameter  A  obtained  via  y[n]  and  s[n],  respectively, 
i.e., 


N 


) 


B(A ;  s*) 


(2.4) 


where  B  ( A ;  yjV)  is  the  Cramer-Rao  bound  [31]  for  unbiased  estimation  of  A  from2 


y"  =  [»[i]  m  ■■■  sM]T  • 


(2.5) 


where  y[n]  is  given  by  (2.1),  and  where  B  (A;  sN)  and  sN  are  defined  similarly.  We  often 
consider  the  information  loss  (2.4)  in  dB,  (i.e.,  10  log10  £(A));  it  represents  the  additional 
MSE  in  dB  that  arises  from  observing  y[n)  instead  of  s[n]  in  the  context  of  efficient  estima¬ 
tion  of  A.  From  this  perspective,  better  systems  achieve  smaller  information  loss  over  the 
range  of  parameter  values  of  interest. 

Taking  into  account  the  inherent  dynamic  range  limitations  of  these  signal  quantizers,  we 
assume  that  the  unknown  parameter  A  takes  values  in  the  range  (—A,  A),  with  A  assumed 
to  be  known.  Often,  the  degradation  of  the  estimation  quality  is  conveniently  characterized 
in  terms  of  the  ratio  x  —  A /<jv,  which  we  may  view  as  a  measure  of  peak-signal-to-noise 
ratio  (peak  SNR). 

Worst-case  performance  is  used  to  characterize  the  overall  system.  Accordingly,  we 

2The  use  of  the  term  information  loss  follows  from  the  fact  that  (2.4)  also  equals  the  inverse  of  the  ratio 
of  the  associated  Fisher  information  quantities. 
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define  the  worst-case  Cramer-Rao  bound  and  worst-case  information  loss  via 

Bm ax(A)=  sup  B  (A;  yN)  ,  (2.6) 

\A\<  A 

and 


W A)=  sup  £(A),  (2.7) 

\A\<A 

respectively.  Both  the  worst-case  Cramer-Rao  bound  and  the  worst-case  information  loss 
are  functions  of  other  system  parameters,  such  as  av  and  F(-),  the  dependence  on  which  is 
suppressed  for  convenience  in  the  above  definitions. 

As  a  consequence  of  the  linear  model  (2.3),  the  Cramer-Rao  bound  B  ( A ;  sjV)  is  inde¬ 
pendent  of  the  parameter  value  A,  i.e.,  B  (A;  sN)  =  B  (0;  sN)  for  any  A.  Furthermore, 
the  bound  B  (A;  sN)  is  proportional  to  al;  by  letting  s[n]  =  A  +  u[n]  and  using  (2.3),  we 
obtain 


B  (A;  /)=ff^(0;  s)  /N  ,  (2.8) 

where  B  (0;  s)  denotes  the  Cramer-Rao  bound  for  estimating  A  based  on  any  one  sample 
of  the  IID  sequence  s[n].  Hence,  since  B  (A;  sjV)  from  (2.8)  is  independent  of  A,  both 
Bmax(A)  and  £max(A)  can  be  used  interchangeably  as  figures  of  merit  for  assessing  the 
performance  of  quantizer  systems. 

Table  2.1  summarizes  the  performance  limits  for  a  number  of  important  scenarios.  As 
we  show  in  this  chapter,  in  any  of  these  scenarios  the  worst-case  information  loss  can  be 
conveniently  characterized  as  a  function  of  peak  SNR  x-  According  to  Table  2.1,  pseudo¬ 
noise  control  inputs  with  properly  chosen  power  levels  provide  performance  improvements 
over  control-free  systems  in  any  admissible  noise.  Specifically,  for  pseudo-noise  control 
inputs  the  control  input  can  be  designed  so  that  the  worst-case  information  loss  grows  only 
quadratically  with  x>  while  it  always  grows  faster  than  quadratically  in  the  control-free  case. 
For  scenarios  where  the  control  input  is  known  for  estimation,  the  associated  worst-case  loss 
can  be  made  to  grow  as  slow  as  x  with  proper  control  input  selection.  Finally,  if  feedback 
from  the  quantized  output  to  the  control  input  is  available  and  properly  used,  a  fixed  small 
information  loss,  which  does  not  grow  with  increasing  X;  can  be  achieved.  In  the  remainder 
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Control  Input 

Order  of  growth  of  information  loss 

Gaussian  case 

General  case 

Control-free  case 

eX2/2 

>  x2 

Pseudo- noise  (known  statistics) 

x2 

X2 

Known  input 

X 

X 

Feedback-controlled  input 

1 

1 

Table  2.1:  Order  of  growth  of  worst-case  information  loss  as  a  function  of  peak  SNR 
X  =  A  jav  for  large  \  and  for  any  M-level  quantizer.  The  quantity  A  denotes  the  dynamic 
range  of  the  unknown  parameter,  and  av  is  the  sensor  noise  power  level.  The  Gaussian  case 
refers  to  Gaussian  sensor  noise  of  variance  a2.  The  general  case  refers  to  any  admissible 
sensor  noise. 


of  Section  2.2  we  develop  the  performance  limits  shown  in  Table  2.1,  while  in  Section  2.3 
we  develop  control  selection  methods  and  associated  estimators  that  achieve  these  limits. 


2.2.1  Pseudo-noise  Control  Inputs 

In  this  section  we  consider  signal  quantizers  with  control  inputs  w[n\  that  correspond  to 
sample  paths  of  an  IID  process,  independent  of  the  sensor  noise  process  u[n],  and  determine 
the  performance  limits  in  estimating  the  unknown  parameter  A  based  on  observation  of 
yN  from  (2.5),  by  simply  exploiting  the  statistical  characterization  of  w[n]  at  the  receiver. 
In  general,  we  may  consider  pseudo-noise  control  inputs  that  are  parameterized  by  means 
of  the  scale  parameter  crw,  i.e.,  w[n]  =  oww[ri\,  where  w[n)  is  an  admissible  IID  noise 
sequence  with  PDF  p (•).  Our  goal  is  to  select  the  pseudo-noise  level  aw  so  as  to  optimize 
performance  in  terms  of  the  associated  worst-case  information  loss.3 

The  Cramer-Rao  bound  for  all  unbiased  estimates  of  the  parameter  A  based  on  obser¬ 
vation  of  the  vector  yN  is  defined  as  [31] 


d2  lnP(yN;A) 
8A 2 


where  P(yN;  A)  is  the  associated  likelihood  function,  denoting  the  probability  that  the 
particular  vector  yN  is  observed  from  (2.1)  given  that  the  unknown  parameter  takes  the 


3The  scaling  factor  <rw  is  a  measure  of  the  strength  of  the  noise  process  w[n}.  For  cases  where  the  noise 
variance  exists,  c2w  denotes  the  power  of  the  pseudo-noise  signal  to  within  a  scaling. 
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value  A.  In  particular,  the  log-likelihood  function  satisfies 

M 

In  P(yN;  A)  =  fCyt  (; yN )  In  Pr(y[n]  =  Y;  A)  (2.9) 

i=i 

where  fCyt  (yN)  denotes  the  number  of  entries  in  yN  that  are  equal  to  Y.  Since 


a[n]  =  y[n]  +  tu[n] 


(2.10) 


is  an  IID  sequence,  B  (A;  yN)  satisfies  the  condition 

B  (A;  y*)  =  -^(A;  y)  ,  (2.11) 

where  B  (A;  y)  corresponds  to  the  Cramer-Rao  bound  for  estimating  A  based  on  any  one 
sample  of  the  IID  sequence  y[n].  Finally,  by  taking  the  second  partial  derivative  of  (2.9) 
with  respect  to  A  followed  by  an  expectation,  we  obtain 


B  (A;  y)  = 


Af  \Pa(Xi-1-A)-pa  ( Xj-A)]2\ 
VIS  ~A)~Ca  (Xi  -A)  ) 


(2.12) 


For  the  system  corresponding  to  the  symmetric  two-level  quantizer  (M  =  2,  Xi  =  0), 
i.e., 


F(x)  =  sgn  x  ,  (2.13) 

the  Cramer-Rao  bound  (2.12)  reduces  to 

B  (A;  y)  =  CQ  (-A)  [1  -  Ca  (- A)]  \pa  (- A)]“2  .  (2.14) 

When,  in  addition,  the  PDF  pa  (•)  is  an  even  function  of  its  argument,  (2.14)  further 
specializes  to 


B  (A;  y)  =  B  (-A;  y)  =  Ca  (- A)  CQ  (A)  [pa  (A)]"2  .  (2.15) 

We  consider  the  special  case  where  u[n]  and  w[n]  are  IID  Gaussian  processes  and  F(-) 
is  the  symmetric  two-level  quantizer,  and  determine  the  pseudo-noise  level  that  minimizes 
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the  worst-case  information  loss.  We  then  consider  the  general  case,  i.e.,  the  case  M  >  2 
where  u[n]  and  w[n\  are  any  IID  noise  processes. 


Special  Case:  Gaussian  Noises  and  M  =  2 

For  the  system  M  =  2  where  v[n\  and  w[n]  are  independent  IID  Gaussian  noise  sequences 
with  variances  cr2  and  respectively,  the  Cramer-Rao  bound  (2.15)  reduces  to 

B(A;  y)  =  2  xalQ  Q  exp  ,  (2.16) 

where  oa  =  +  cr%,  and  Q  (x)  =  j^°(l/v/2n:)  e~<2/2  dt.  Fig.  2-3  depicts  the  associated 

information  loss  (2.4)  as  a  function  of  A  for  A  =  1,  av  —  0.1  and  various  aw  levels. 
Observation  of  Fig.  2-3  reveals  several  key  characteristics  of  this  type  of  quantizer-based 
processing.  Specifically,  in  this  Gaussian  sensor  noise  scenario  the  minimum  achievable 
information  loss  occurs  for  A  =  0  and  aw  =  0  and  equals  10  log  10(7r/2)  ss  2  dB.  In 
addition,  for  any  pseudo-noise  power  level  aw  the  information  loss  is  an  increasing  function 
of  |j4|.  This  property  is  shared  by  many  other  common  noises,  such  as  the  Laplacian  and 
the  Cauchy.  More  important,  as  the  figure  reveals,  proper  use  of  pseudo-noise  ( aw  ^  0) 
can  have  a  major  impact  on  performance  in  terms  of  reducing  the  associated  worst-case 
information  loss. 

The  sensitivity  of  performance  with  respect  to  the  optimal  pseudo-noise  power  level 
is  examined  in  Fig.  2-4  for  the  Gaussian  noise  scenario.  In  particular,  the  figure  depicts 
the  additional  worst-case  information  loss  (in  dB)  that  arises  from  suboptimal  selection  of 
the  pseudo-noise  power  level.  Since  the  encoding  performance  for  the  optimally  selected 
pseudo-noise  power  level  is  used  as  a  reference,  the  additional  worst-case  information  loss  for 
the  optimal  pseudo-noise  encoder  equals  zero  dB.  From  the  figure  we  see  that  the  optimal 
aggregate  noise  level  is  well  approximated  by 


a 


opt 

Q 


(2.17) 
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Figure  2-3:  Information  loss  for  a  system  comprising  a  two-level  quantizer  and  an  IID 
Gaussian  pseudo-noise  control  input,  for  various  pseudo-noise  power  levels  aw.  The  sensor 
noise  is  IID  Gaussian  with  variance  a 1  —  0-01. 


so  that  the  optimal  pseudo-noise  level  satisfies 


if  av  <  oT 
otherwise 


(2.18) 


If  ov  <C  A  (high  SNR),  Fig.  2-4  reveals  that  for  the  fairly  wide  range  of  pseudo-noise  levels 

|^pt  <^<2<pt, 

the  associated  performance  is  inferior  to  that  corresponding  to  the  optimal  pseudo-noise 
level  by  less  than  3  dB.  However,  the  performance  degrades  rapidly  as  the  pseudo-noise 
level  is  reduced  beyond  5  a%pt/8.  For  instance,  for  aw  =  er2pt/3,  there  is  nearly  30  dB  of 
additional  loss  incurred  by  the  suboptimal  selection  of  the  pseudo-noise  level. 

The  information  loss  associated  with  the  optimal  pseudo-noise  level  corresponds  to  the 
best  achievable  performance  by  a  particular  family  of  pseudo-noise  sources — in  this  partic¬ 
ular  example  the  family  of  zero-mean  normal  distributions.  For  the  optimal  choice  of  aw  in 
(2.18),  the  worst-case  information  loss  can  be  completely  characterized  by  means  of  peak 
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30; 


Figure  2-4:  Additional  worst-case  information  loss  (solid)  due  to  suboptimal  pseudo-noise 
level  selection  for  a  two-level  quantizer.  The  net  noise  sequence  a[n]  =  v[n]  -f  iu[n]  is 
Gaussian  with  variance  o\.  The  “x”  marks  depict  the  additional  information  loss  for  net 
noise  levels  5/8  aSpt  and  2<rSpt.  The  “o”  mark  depicts  the  additional  information  loss  at 

oTjZ. 


SNR  x-  In  particular,  by  using  (2.17)-(2.18)  with  (2.16)  in  (2.4)  we  obtain  the  optimal 
worst-case  information  loss  for  the  Gaussian  scenario  with  pseudo-noise  control,  namely, 


££«(*)  = 


2  K  Q  ix)  Q  (~x)  e*2  if  0<x<| 

f Q(!)Q(-i)^x2  ^  f<X 


(2.19) 


where  we  indicate  explicitly  that  in  this  case  the  worst-case  information  loss  is  a  function 
of  x- 

As  (2.19)  reveals,  for  estimation  in  Gaussian  noise  via  a  two-level  quantizer  system, 
the  worst-case  information  loss  can  be  made  to  grow  quadratically  with  peak  SNR  by 
judicious  selection  of  a  Gaussian  pseudo-noise  control  input.  For  comparison,  the  worst- 
case  information  loss  in  the  absence  of  control  input  grows  exponentially  with  peak  SNR. 
In  particular,  by  substituting  B  (A;  y )  from  (2.16)  in  (2.7),  we  obtain 


£max(x)  =  2  7rQ  (X)  Q  (— X)  ^  , 


(2.20) 
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which  is  proportional  to  exp(x2/2)  for  large  x-  .The  results  in  (2.19)-(2.20)  extend  to 
quantizers  with  M  >  2,  i.e.,  the  worst-case  information  loss  grows  as  exp(x2/2)  for  control- 
free  systems,  while  it  can  be  made  to  grow  as  x2  for  appropriately  chosen  Gaussian  pseudo¬ 
noise  control  inputs. 

General  Case:  Arbitrary  Noises  and  M  >2 

As  we  show  next,  proper  use  of  a  pseudo-noise  control  input  tu[n]  can  improve  performance 
over  the  control-free  system  in  any  (admissible)  sensor  noise  u[n]  and  for  any  M-level  quan¬ 
tizer.  Substituting  (2.8)  and  (2.11)  in  (2.4)  reveals  that  the  associated  information  loss  is 
independent  of  N.  Thus,  we  may  focus  on  the  case  N  =  1  without  any  loss  of  generality. 
We  next  use  BmSLX  (A;  av,  aw)  to  denote  the  worst-case  Cramer-Rao  bound  (2.6),  in  order  to 
make  its  dependence  on  av  and  aw  explicit.  Since  u[n]  is  an  admissible  process,  the  Cramer- 
Rao  bound  (2.12)  is  continuous  in  the  av  variable,  and  so  is  Bmax  (A;  av,  aw).  Thus,  given 
any  fixed  aw  >  0  and  A,  for  small  enough  av  we  have 

^max  (A;*,,  &w)  Bm ax  (A;  0,  <rw)  .  (2.21) 

Substitution  of  (2.21)  and  (2.8)  in  (2.7)  reveals  that  ££Ikx(x)  ~  X2  is  achievable  for  large 
X  •  Furthermore,  since  Bmax  (A;  av,  aw)  is  also  continuous  in  aw,  for  any  F(-)  with  fixed 
M  <  oo 


inf  Bmax  (A;  0,  <Jw)  >  0  ,  (2.22) 

€  [0,  oo) 

which  in  conjunction  with  (2.8)  and  (2.21)  implies  that  the  worst-case  information  loss  can 
not  be  made  to  grow  slower  than  x2  for  pseudo-noise  control  inputs.  Therefore,  at  high 
peak  SNR  the  worst-case  information  loss  for  pseudo-noise  control  inputs  £Sax(x)  grows 
quadratically  with  peak  SNR  for  pseudo-noise  control  inputs.  In  general,  the  sensor  noise 
level  may  be  fixed,  in  which  case  we  are  interested  in  selecting  the  pseudo-noise  level  aw  as 
a  function  of  the  dynamic  range  A  so  as  to  minimize  the  worst-case  information  loss.  From 
(2.21)-(2.22)  the  optimal  worst-case  information  loss  rate  can  be  achieved  by  selecting 
aw  =  A  A  for  some  A  >  0.  This  is  in  agreement  with  our  conclusions  for  the  Gaussian 
scenario  in  the  special  case  M  =  2,  as  (2.17)-(2.19)  clearly  demonstrate.  For  comparison, 
in  App.  A.l  we  show  that  for  control-free  systems  corresponding  to  F(-)  in  (2.2)  and  for 
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any  sensor  noise  the  worst-case  information  loss  (x)  grows  faster  than  x2  f°r  large  x- 
Remarkably,  pseudo-noise  control  inputs  with  appropriately  selected  power  levels  provide 
performance  improvements  over  the  control-free  systems  for  any  sensor  noise  at  high  peak 
SNR. 


2.2.2  Known  Control  Inputs 

We  next  develop  performance  limits  for  scenarios  where  the  estimator  can  exploit  detailed 
knowledge  of  a  suitably  designed  control  waveform.  In  particular,  we  determine  the  mini¬ 
mum  possible  growth  rate  of  the  worst-case  information  loss  as  a  function  of  x>  and  develop 
control  input  selection  strategies  that  achieve  the  minimum  possible  rate. 

The  Cramer-Rao  bound  for  unbiased  estimates  of  A  based  on  yN  and  given  knowledge 
of  the  associated  N  samples  of  w[n]  is  denoted  by  B  ( A ;  yN,  wN)  and  satisfies 


B  ( A ;  yN ,  ww) 


(^\d2  In  P(yN;  A,'wN)'\ 

- Ja?  J 


-1 


^[Z?(A  + w[n];  y)]  1 


(2.23) 


where  B  (A;  y )  is  given  by  (2.12),  with  a  replaced  by  v,  and  where  P(yN;  A,  wN)  denotes  the 
associated  likelihood  function.  As  expected,  the  associated  worst-case  Cramer-Rao  bound 
and  worst-case  information  loss  are  functions  of  the  control  waveform  wN.  In  App.  A.2  we 
show  that,  for  any  known  control  waveform  selection  strategy,  the  worst-case  information 
loss  associated  with  any  M-level  signal  quantizer  grows  at  least  as  fast  as  x  for  any  sensor 
noise  distribution.  This  includes  the  optimal  scheme,  which  selects  the  waveform  w[n]  that 
results  in  minimizing  the  worst-case  information  loss  for  any  given  set  {A,  av,  p„  (•) ,  F(-)}. 

Classes  of  periodic  waveforms  parameterized  by  the  period  K  are  appealing  candidates 
for  known  control  inputs,  since  they  are  easy  to  construct  and  can  be  chosen  so  that  the 
worst-case  information  loss  grows  at  the  minimum  possible  rate.  In  constructing  these 
classes  of  periodic  waveforms,  we  use  as  a  figure  of  merit  the  worst-case  information  loss 
for  N  — >  oo;  extensions  to  the  finite  N  case  are  developed  in  App.  A.2.  From  (2.23),  the 
Cramer-Rao  bound  for  estimating  A  based  on  yN,  where  N  is  a  multiple  of  the  period  K, 
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is  given  by, 


B  ( A ;  yN,  wN) 


J_  _ A _ 

N  En=i  lB  (A  +  *»[«];  y)]'1  ' 


(2.24) 


As  we  show  next,  in  order  to  achieve  the  minimum  possible  growth  rate  it  suffices  to 
select  w[n]  from  properly  constructed  A'- periodic  classes  for  which  there  is  an  one-to-one 
correspondence  between  each  element  in  the  class  and  the  period  A.  Optimal  selection  of 
the  control  input  in  this  case  is  equivalent  to  selecting  the  period  A  that  minimizes  the 
associated  worst-case  information  loss,  or  equivalently,  the  worst-case  Cramer-Rao  bound 
from  (2.24) 


A'0pt  (A,  crv)  =  argmin  sup 

K  ,4  6  (—A,  A) 


_ A _ 

+  y)]"1 


(2.25) 


where  B  ( A ;  y)  is  given  by  (2.12)  with  a  replaced  by  v.  We  next  develop  a  framework  for 
selecting  the  control  waveform  from  properly  constructed  classes  of  A'-periodic  waveforms 
for  the  case  M  —  2,  which  results  in  achieving  the  optimal  growth  rate  of  worst-case 
information  loss.  Then,  we  extend  our  framework  to  quantizers  with  M  >  2. 


Optimized  Periodic  Waveforms  for  Signal  Quantizers  with  M  =  2 

The  construction  of  the  elements  of  the  A'-periodic  class  in  the  case  M  =  2  is  based  on 
the  observation  that  in  the  control-free  scenario  the  worst-case  information  loss  grows  with 
A  for  fixed  ov.  This  observation  suggests  that  the  information  loss  is  typically  largest  for 
parameter  values  that  are  furthest  from  the  quantizer  threshold.  This  is  strictly  true,  for 
instance,  for  Gaussian  sensor  noise,  since  B  (A;  y)  in  (2.16)  is  an  increasing  function  of  |A|. 
Since  our  objective  is  to  optimize  over  the  worst-case  performance,  a  potentially  appealing 
strategy  is  to  construct  the  A-periodic  waveform  u?[rc]  so  as  to  minimize  the  largest  distance 
between  any  A  in  (—A,  A)  and  the  closest  effective  quantizer  threshold.  For  this  reason, 
we  consider  A-periodic  control  inputs,  which  have  the  form  of  the  sawtooth  waveform 

u;[n]  =  Sw  ^  +  n  mod  K^j  ,  (2.26) 

where  the  effective  spacing  between  thresholds  is  given  by  Sw  =  2  A/(A'  —  1).  The  net 
effect  of  the  periodic  control  input  (2.26)  and  the  symmetric  two-level  quantizer  (2.13)  is 
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equivalent  to  a  two-level  quantizer  with  a  periodically  time- varying  threshold;  it  is  important 
to  observe  that  the  time-varying  quantizer  threshold  comes  within  at  least  Sw/2  of  any 
possible  parameter  value  once  every  K  samples. 

For  the  system  with  F(-)  given  by  (2.13)  and  w[n]  given  by  (2.26),  the  optimal  period 
/1'opt  is  completely  characterized  by  means  of  peak  SNR  x;  using  (2.14)  in  (2.25)  reveals 
that  R'0pt  satisfies  A'opt(A,  <rv)  =  Kopt(fiA,  fiav)  for  any  fi  >  0.  For  this  reason,  we  use 
the  one- variable  function  K0pt (x)  to  refer  to  the  optimal  period  from  (2.25)  for  a  particular 
X- 

In  the  context  of  the  sawtooth  A-periodic  inputs  (2.26),  strategies  that  select  K  so  as  to 
keep  a  fixed  sawtooth  spacing  Sw  achieve  the  minimum  possible  growth  rate.  In  particular, 
in  App.  A.2  we  show  that,  for  any  given  x>  if  we  select  the  period  K  in  (2.26)  according  to 

K  =  \XX +11  (2.27) 

where  A  can  be  any  positive  constant,  the  associated  worst-case  information  loss  grows 
linearly  with  x-  In  general,  there  is  an  optimal  A  for  any  particular  noise  PDF  Pv(-), 
resulting  in  an  optimal  normalized  sawtooth  spacing.  Specifically,  consider  the  normalized 
spacing  between  successive  samples  of  w[n]  in  (2.26),  namely, 

d(x;A)  =  ^  =  ^T.  (2.28) 

<TV  it  —  1 

In  addition,  let  dopt(x)  denote  the  normalized  spacing  associated  with  the  optimal  period 
A’opt(x)  from  (2.25),  i.e., 


doptix)  =  d(x;I< opt(x))  • 


(2.29) 


In  App.  A.2,  we  outline  a  method  for  finding  the  asymptotic  optimal  normalized  spacing 


doo  =  Hm  dopt(x), 

X-^oo 


(2.30) 


associated  with  a  particular  sensor  noise  PDF.  For  purposes  of  illustration,  we  also  show  in 
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App.  A.2  that  in  the  special  case  that  the  sensor  noise  is  Gaussian  with  variance  a\ 


doo  «  2.5851 , 


(2.31) 


while  the  associated  worst-case  information  loss  is  well  approximated  by 


£££*(X)«  1-4754 


(2.32) 


for  large  %•  In  this  Gaussian  scenario,  if  we  select  w[n]  as  in  (2.26)  with  K  =  [2x/doo  + 1]? 
the  worst-case  information  loss  is  given  by  (2.32)  and  achieves  the  optimal  growth  rate  for 
known  control  waveforms.  We  next  extend  the  above  analysis  to  quantizers  with  M  >  2. 


Optimized  Periodic  Waveforms  for  Signal  Quantizers  with  M  >  2 

As  we  have  seen  in  the  preceding  section,  selection  of  iu[n]  according  to  (2.26)  for  M  =  2 
results  in  a  two-level  quantizer  with  periodically  time-varying  thresholds  uniformly  spaced 
in  [—A,  A].  This  selection  method  minimizes  the  maximum  distance  between  the  parameter 
value  and  the  closest  of  the  time-varying  thresholds,  over  the  dynamic  range  (—A,  A).  The 
same  strategy  can  be  used  for  M  >  2,  although  the  availability  of  multiple  thresholds  allows 
for  reduction  of  the  dynamic  range  that  w\n\  needs  to  span.  We  assume  that  all  quantizer 
thresholds  are  within  the  dynamic  range,  i.e.,  —A  <  Xi  <  A,  for  i  =  1,  2,  -  •  •  ,  M  —  1.  In 
this  case,  the  effective  dynamic  range  Aeff  that  tu[n]  needs  to  span  is  given  by 


Aeff  =  max  Sxi , 

i 


where 


Sxi  = 


X\  +  A  if  i  ~  1 

Xi  -  Xi-!  if  2  <  i  <  M  -  2  • 


A  -  XM-i 


if  i  =  M  —  1 


In  particular,  we  consider  using  the  control  input  (2.26)  where  the  effective  spacing  between 
thresholds  8W  is  given  in  terms  of  A  and  the  quantizer  thresholds  X\,  X2,  •  •  •  ,  Xm-  1,  as 
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follows 


Sw  =  max  Swi 
i 


(2.33a) 


where 


5wi  = 


2  8xi 

K-\ 


if  i  =  1,  M  —  1 


2Sx{ 

~K~ 


if  2  <  *  <  M  -  2 


(2.33b) 


For  any  A  in  (—A,  A),  this  selection  guarantees  that  at  least  one  of  the  M  —  1  time- 
varying  quantizer  thresholds  is  within  8wf  2  of  the  parameter,  where  8W  is  given  by  (2.33a). 
One  can  in  principle  perform  the  optimization  (2.25)  to  obtain  Kop t(A,  av)  for  any  F(-) 
with  M  >  2.  We  should  emphasize,  however,  that  at  high  SNR  we  may  often  obtain  an 
approximate  estimate  of  performance  via  our  results  for  the  case  M  —  2.  For  instance, 
for  A ett/<7v  large  and  small  enough  A  in  (2.27),  the  optimal  normalized  spacing  and  the 
corresponding  worst-case  information  loss  for  a  quantizer  with  M  >  2  are  approximately 
given  by  the  respective  quantities  for  the  symmetric  two-level  quantizer,  with  x  replaced 
by  Xeff  =  Aeff/ffv 

If  in  addition  there  is  freedom  in  selecting  the  M  —  1  quantizer  thresholds,  these  can 
be  selected  so  that  8w{  —  5wj  for  all  i  and  j  in  (2.33b)  which  implies  that  Sw  —  A/[(M  — 
1)  K  —  1].  This  selection  guarantees  that  for  every  K  successive  observations,  the  collection 
of  all  M  K  associated  quantizer  thresholds  form  a  uniformly  spaced  collection  in  [—A,  A]. 
For  instance,  in  the  special  case  that  the  sensor  noise  is  Gaussian,  the  optimal  normalized 
spacing  and  the  worst-case  loss  for  large  x  are  given  by  (2.31)  and  (2.32),  respectively,  with 
x/(M  -  1)  replacing  x  on  the  left  hand  side  of  (2.32).  In  summary,  simply  constructed 
classes  of  periodic  control  waveforms  achieve  the  optimal  information  loss  growth  rate  with 
peak  SNR. 


2.2.3  Control  Inputs  in  the  Presence  of  Feedback 

In  this  section  we  consider  the  scenario  where,  in  addition  to  knowing  the  control  waveform, 
the  estimator  has  the  option  of  using  feedback  from  past  output  observations  in  the  selection 
of  future  control  input  values.  Specifically,  we  develop  performance  bounds  for  the  problem 


A 


Estimator 


AM 


Figure  2-5:  Estimation  based  on  observations  from  a  signal  quantizer,  where  feedback  from 
the  quantized  output  is  used  in  the  selection  of  the  control  input. 

of  estimation  of  A  based  on  yN,  where  the  control  input  sequence  tu[n]  is  a  function  of  all 
past  quantized  observations.  This  scenario  is  depicted  in  Fig.  2-5  where  tu[n]  =  <7(yn_1). 

We  next  show  that  the  worst-case  information  loss  for  any  feedback- based  control  input 
strategy  is  lower  bounded  by  the  minimum  possible  information  loss  for  the  same  quantizer 
system  with  w\n\  —  0;  in  Section  2.3  we  develop  feedback-based  control  selection  algorithms 
that  effectively  achieve  this  lower  bound.  Examination  of  the  Cramer-Rao  bound  (2.23) 
reveals  that  for  any  A  in  (—A,  A)  we  can  obtain  information  loss  equal  to  £(A0)  by  selecting 
w[n]  =  A0  —  A.  In  particular,  if  there  exists  a  parameter  value  A*  for  which  B  (A;  y)  > 
B  (A«;  y)  for  all  A  in  (— oo,  oo)  and  where  B  (A;  y)  is  given  by  (2.12)  with  a  replaced  by 
v,  then  using  (2.23)  we  obtain 

B  (A;  yN,  wN)  >  B  (A,;  y)  /N ,  (2.34) 

with  equality  achieved  for  w{n\  =  A»  —  A  for  n  =  1,  2,  •  •  •  ,  N .  This  control  input  results  in 

£  (A;  w*)  >  £  (A;  A,  -  A)  =  £(A.)  ,  (2.35) 

where  £(A)  is  given  by  (2.4),  and  where  B  (A;  y)  is  given  by  (2.12)  with  a  replaced  by  v. 

The  minimum  information  loss  from  (2.35)  decreases  as  the  number  of  quantization 
levels  increases.  In  App.  A.3  we  show  that  as  we  would  expect,  the  minimum  information 
loss  £(A«)  tends  to  zero  as  the  number  of  quantization  levels  approaches  infinity  for  any 
sensor  noise. 

For  a  number  of  common  sensor  noises  the  control-free  information  loss  for  the  system 
corresponding  to  M  =  2  is  minimized  at  the  negative  of  the  median  of  the  PDF  pv  (•),  i.e., 
Cv  (-A»)  =  1/2.  The  corresponding  minimum  information  loss  (2.35)  can  be  obtained  by 
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Figure  2-6:  Minimum  possible  information  loss  as  a  function  of  quantization  levels  for  a 
uniform  quantizer  in  IID  Gaussian  noise.  For  any  given  M,  the  threshold  spacing  is  selected 
so  as  to  minimize  this  loss. 

evaluating  (2.4)  at  A  =  Am,  while  employing  (2.8)  and  (2.14)  for  aw  =  0,  namely, 

£(A»)  =  [4pl(-A*/av)  5(0;  S)]"1  ,  (2.36) 

which  is  actually  independent  of  av  and  A,  since  —A*/crv  equals  the  median  of  the  PDF  of 
u[n]. 

Special  Case:  Gaussian  Sensor  Noise 

In  the  case  that  the  sensor  noise  is  Gaussian,  the  minimum  information  loss  (2.35)  decays 
rapidly  to  zero  as  more  quantization  levels  are  introduced.  In  Fig.  2-6  we  plot  the  minimum 
possible  information  loss  through  any  uniform  M-level  quantizer  for  various  values  of  M,  in 
the  presence  of  IID  Gaussian  noise.  From  the  figure  it  is  apparent  that  a  few  quantization 
levels  suffice  to  effectively  eliminate  the  minimum  information  loss  due  to  quantizer-based 
processing. 

For  the  two-level  quantizer  (2.13)  in  this  Gaussian  scenario,  use  of  (2.16)  for  aQ  —  er„  in 
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Figure  2-7:  Worst-Case  information  loss  over  |j4|  <  A  for  a  two-level  quantizer  in  zero- 
mean  IID  Gaussian  noise  of  variance  cr%,  with  no  control  input  (solid),  pseudo-noise  control 
inputs  (upper  dashed),  and  known  periodic  control  waveforms  (middle  dashed).  The  dotted 
curve  depicts  approximation  (2.32).  The  lower  dashed  line  depicts  the  minimum  possible 
information  loss  («  2  dB)  for  any  control  input  scheme. 


(2.7)  reveals  that  A *  =  0.  In  this  case,  (2.34)  reduces  to 

B  ( A ;  wN,  yN )  >  B  (0;  y)  JN  =  ^  ,  (2.37) 

while  from  (2.36)  the  information  loss  for  any  parameter  value  A  is  lower-bounded  as  follows 

£(A;w*)>£(0)  =  |,  (2.38) 

which  corresponds  to  a  2  dB  information  loss. 

Fig.  2-7  depicts  the  worst-case  information  loss  for  the  system  corresponding  to  M  =  2 
in  the  context  of  Gaussian  sensor  noise  and  the  various  control  input  scenarios  that  we  have 
examined.  As  the  figure  reflects,  the  performance  of  the  control-free  system  (solid  curve) 
degrades  rapidly  as  the  peak  SNR  is  increased.  The  benefits  of  pseudo-noise  control  inputs 
(upper  dashed  curve)  at  high  peak  SNR  are  clearly  evident,  and  known  periodic  control 
inputs  provide  additional  performance  benefits  (middle  dashed  curve)  over  pseudo-noise 
control  inputs.  In  particular,  the  associated  worst-case  information  loss  increases  linearly 
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with  peak  SNR  as  the  accurate  approximation  (2.32)  reveals.  Finally,  in  the  presence  of 
feedback  from  quantized  output  to  the  control  input,  the  performance  is  lower  bounded  by 
the  minimum  possible  information  loss  of  2  dB,  which  is  independent  of  In  Section  2.3  we 
develop  control  selection  strategies  and  associated  estimators  that  meet  all  these  bounds. 

2.3  Efficient  Estimation 

In  this  section  we  develop  control  input  selection  strategies  and  associated  estimators  which 
achieve  the  performance  limits  computed  in  Section  2.2.  A  natural  measure  of  performance 
of  a  specific  system  comprising  a  control  input  a  quantizer  and  a  particular  estimator  is  the 
MSE  loss,  which  we  define  as  the  ratio  of  the  actual  MSE  of  a  particular  estimator  of  A  based 
on  observation  of  yN,  divided  by  the  Cramer-Rao  bound  for  estimating  A  from  observation 
of  sN.  In  case  an  efficient  estimator  of  A  based  on  sN  exists,  the  notion  of  the  MSE  loss 
of  any  given  estimator  of  A  given  yN  has  an  alternative,  appealing  interpretation.  In  this 
case,  the  MSE  loss  represents  the  additional  MSE  in  dB  that  arises  from  estimating  A  using 
this  particular  estimator  on  yN,  instead  of  efficiently  estimating  A  via  sN .  Analogously  to 
£max  >n  (2-7),  the  worst-case  MSE  loss  of  an  estimator  is  defined  as  the  supremum  of  the 
MSE  loss  function  over  the  range  |A|  <  A. 

In  this  section  we  construct  estimators  for  which  the  corresponding  MSE  loss  asymp¬ 
totically  achieves  the  associated  information  loss,  for  each  of  the  control  input  scenarios 
of  Sec  2.2.  We  examine  the  control-free  and  pseudo-noise  control  scenarios  first,  and  then 
develop  estimators  applicable  to  known  /^-periodic  control  inputs.  Finally,  in  the  context 
of  feedback  we  develop  control  input  selection  strategies  and  associated  estimators  which 
achieve  the  minimum  possible  information  loss  for  any  given  system. 

2.3.1  Pseudo-noise  Control  Inputs 

For  pseudo-noise  control  inputs,  the  maximum-likelihood  (ML)  estimator  of  A  based  on  yN 
over  the  restricted  dynamic  range  \A\  <  A  satisfies 


where  In  P(yN;0)  is  the  log-likelihood  function  given  by  (2.9).  We  first  examine  ML  es¬ 
timation  for  the  system  with  M  =  2,  and  then  construct  estimators  for  signal  quantizers 
with  M  >  2.  Estimators  of  A  for  control-free  systems  can  be  readily  obtained  as  a  special 
case  of  the  estimators  of  A  for  the  associated  systems  with  pseudo-noise  control  inputs  by 
setting  aw  =  0. 

ML  Estimation  for  Signal  Quantizers  with  M  =  2  in  IID  Noise 

If  F(-)  is  given  by  (2.13)  and  a[n]  is  admissible,  the  ML  estimator  (2.39)  can  be  found  in 
closed  form,  by  setting  to  zero  the  partial  derivative  of  the  log-likelihood  function  (2.9)  with 
respect  to  A,  viz., 


**» 


-4ml  (yN;  A)  =  IA  (-4ml  (y^;  oo)) 
where  Za  (•)  is  the  following  piecewise-linear  limiter  function 

I 


Ja  (x)  =  { 


x 

A  sgn  (x) 


if  |x|  <  A 
otherwise 


(2.40) 


(2.41) 


The  function  Aml  (yiV;  oo)  denotes  the  ML  estimate  of  A  from  yN  when  there  are  no 
restrictions  imposed  in  the  dynamic  range  of  the  unknown  parameter  A.4  In  particular, 


-4ml  (yN;  oo)  =  arg  max  In  P  {yN\  0) 


=  -C-1 


'ft  (yN) 


N 


(2.42) 


where  C~l  (•)  in  (2.42)  is  the  inverse  of  Ca  (•),  and  (y^)  denotes  the  number  of  elements 
in  yN  that  are  equal  to  Y|.  In  the  special  case  that  w[n]  and  u[n]  are  zero-mean  IID  Gaussian 
noise  sequences  with  variances  and  respectively,  (2.42)  reduces  to 


-4ml  (yN;  oo )  =  —<ra  Q  1  ^  ^  • 


(2.43) 


For  any  parameter  value  A  in  the  range  (—A,  A),  the  Cramer-Rao  bound  (2.14)  is  a 


4 Note  that  (2.40)  does  not  necessarily  hold  for  M  >  2. 
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reasonable  predictor  of  the  MSE  performance  of  the  ML  estimator  (2.40)-(2.42)  provided 
that  the  number  of  observations  N  is  large  enough.  Indeed,  as  shown  in  App.  A. 4  for  any 
A  €  (—A,  A),  the  ML  estimator  (2.40)-(2.42)  is  asymptotically  efficient  in  the  sense  that 
it  achieves  the  Cramer-Rao  bound  for  unbiased  estimates  (2.14)  for  large  enough  N,  i.e., 

=  B(A;  y )  . 

Although  the  ML  estimate  (2.40)-(2.42)  is  asymptotically  unbiased  and  efficient  for  any 
A  in  (—A,  A),  the  associated  MSE  does  not  converge  uniformly  to  the  Cramer-Rao  bound 
in  the  parameter  A  with  N.  Specifically,  for  any  fixed  N,  no  matter  how  large,  there 
exist  parameter  values  close  enough  to  the  boundaries  ±A  for  which  the  ML  estimator 
has  significant  bias,5  in  which  case  (2.14)  should  not  be  expected  to  accurately  predict  the 
associated  MSE  of  the  ML  estimator.  This  is  clearly  reflected  in  Fig.  2-8,  where  the  actual 
MSE  loss  for  Am l  (y^;  A)  is  also  depicted  alongside  the  associated  information  loss  for 
the  Gaussian  noise  scenario.  In  particular,  the  dashed  and  solid  lines  depict  the  MSE  loss 
from  Monte-Carlo  simulations  for  the  ML  estimator  (2.40)-(2.42),  in  the  absence  ( crw  =  0) 
and  presence  (<rw  =  2/n)  of  pseudo-noise  control  input,  respectively,  for  crv  =  0.1,  A  =  1, 
and  N  =  100,  104.  As  we  can  see  in  Fig.  2-8,  when  the  pseudo-noise  level  is  aw  =  2/7T 
the  worst-case  MSE  loss  is  about  21  dB.  However,  in  the  absence  of  a  control  input,  the 
worst-case  MSE  loss  is  about  36  dB  for  N  =  100,  and  55  dB  for  N  =  104.  For  both  values 
of  N  the  Cramer-Rao  bound  (2.14)  is  applicable  for  only  a  subset  of  the  dynamic  range, 
whose  size  increases  with  N.  In  fact,  since  the  ML  estimator  is  asymptotically  efficient 
for  any  |A|  <  A  with  respect  to  the  Cramer-Rao  bound  (2.14)  for  unbiased  estimates,  the 
worst-case  MSE  loss  for  the  control-free  system  increases  with  N  towards  the  associated 
worst-case  information  loss  (2.20),  which  is  approximately  211  dB. 

ML  Estimation  for  Signal  Quantizers  with  M  >  2  in  HD  Gaussian  Noise 

For  the  estimation  problem  (2.1)-(2.2)  where  F(-)  is  an  M-level  quantizer  and  o[re]  is  an  IID 
sequence,  the  set  of  sufficient  statistics  reduces  to  ACy  (yN)  ,  •••  ,  fCyM_1  (yN)  ( cf ’.  (2.9)). 

sBy  incorporating  the  bias  of  the  ML  estimator  (2.40}-(2.42)  it  is  possible  to  obtain  a  Cramer-Rao  bound 
that  directly  applies  to  the  associated  MSE.  An  even  tighter  bound  can  be  obtained  by  properly  combining 
three  separate  Cramer-Rao  bounds,  each  describing  the  effects  of  a  piecewise  linear  region  of  the  soft  limiter 
Za  (•}  on  Aml  (A;oo)  in  (2.40). 


lim  N  E 

N-*oo 


(^ml  (y^;  A)  -  a'j 


Figure  2-8:  MSE  loss  from  Monte-Carlo  simulations  for  a  system  comprising  a  Gaussian 
pseudo-noise  control  input  a  two-level  quantizer  and  the  ML  estimator  (2.40)-(2.42)  for 
A  =  1,  av  —  0.1  and  various  pseudo-noise  power  levels.  The  dashed  curves  depict  the 
MSE  loss  of  AMl  (y*;  A)  in  the  absence  of  control  input  (i.e.,  aw  =  0);  upper  curve: 
N  =  104,  lower  curve:  N  =  100.  The  solid  curves  depict  the  MSE  loss  of  Aml  {yNi  A)  for 
aw  =  2/7T,  and  for  N  =  100,  104.  For  comparison,  the  associated  information  loss  functions 
are  depicted  by  the  dotted  curves  (also  shown  in  Fig.  2-3). 

For  the  special  case  that  a[n\  is  Gaussian  with  variance  we  develop  in  App.  A.5  an  EM 
algorithm  [14]  for  obtaining  the  ML  estimate  (2.39).  This  algorithm  takes  the  following 
form: 


(2.44) 


initialized  with  Ag^  =  0.  Provided  that  the  log-likelihood  function  does  not  possess  multiple 
local  minima,  (2.44)  provides  the  ML  estimate  (2.39),  i.e., 


Aml  (y  ,  A)  =  lim  A 


a(*) 


oo 


EM 
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Empirical  evidence  suggests  that  linn-_»oo  Aem  obtained  via  the  algorithm  (2.44)  is  asymp¬ 
totically  efficient,  i.e.,  it  achieves  (2.12)  for  large  N.  Consequently,  use  of  information  loss 
as  an  accurate  predictor  of  the  MSE  loss  is  also  justified  in  this  scenario. 

Efficient  Estimation  for  Signal  Quantizers  with  M  >  2  in  IID  Noise 

In  general,  there  is  no  computationally  efficient  method  for  obtaining  the  ML  estimate  (2.39) 
of  A  in  nonGaussian  noise  via  a  signal  quantizer  with  M  >  2.  In  this  section  we  present  an 
alternative  class  of  elementary  estimators  which  can  be  shown  to  be  asymptotically  efficient 
for  any  admissible  noise  PDF  pa  (•),  in  the  sense  that  for  any  |A|  <  A  the  MSE  of  the 
estimator  approaches  the  bound  (2.12)  for  large  N. 

Without  loss  of  generality  we  may  view  the  output  of  the  quantizer  F(-)  in  (2.2)  as  the 
collection  of  the  outputs  of  M  —  1  two-level  quantizers  generating  the  following  observed 
sequences 


yi[n]  =  sgn  (x[n]  -  X,)  i  =  1,  2,  •  •  * ,  M  —  1, 


where  z[n]  =  s[n]  -f  a{n]  (c/.  Fig.  2-2)  and  the  Xi  s  are  the  thresholds  of  the  quantizer. 
Consider  the  ML  estimates  of  A  formed  from  each  of  these  binary  sequences,  namely, 


Ai  =  2a  (Aml  (yf;  oo)  +  Xt)  *  =  1,  2,  ••  •  i  M  -  1 ,  (2.45) 

where 

T 

yf=[y.[i]  y,-[2)  . 


and  where  2a  (•)  is  given  by  (2.41),  and  Aml  (■;  oo)  is  given  by  (2.42)  with  a  replaced  by  u. 
In  App.  A. 6  we  show  that  the  joint  cumulative  distribution  of 


(2.46) 


approaches  the  cumulative  distribution  of  a  Gaussian  random  vector  with  mean  A 1  (where 
1  denotes  a  vector  of  l’s)  and  covariance  matrix  C/N,  whose  inverse  is  given  by  (A.39).  We 
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also  show  in  the  appendix  that  if  we  use 


A  =  (1tC"11)_11tC'-1A 


(2.47) 


where  C  =  C(At)  for  some  1  <  i  <  M  —  1,  the  estimator  A  is  asymptotically  efficient,  i.e., 


lim  NE 

TV— >oo 


(A- A)7;  A]  =B(A;  y) 


(2.48) 


where  B  ( A ;  y)  is  given  by  (2.12).  In  practice,  in  computing  C  we  may  select  the  value  of  i 
for  which  B  (An  is  minimum,  so  as  to  expedite  the  MSE  convergence  to  the  asymptotic 
performance  predicted  by  (2.48).  In  summary,  the  estimator  first  obtains  the  set  (2.46)  by 
means  of  (2.45)  and  (2.41)-(2.42),  it  then  selects  the  value  of  i  for  which  B  ^4,-;  y f'j  is 
minimized  and  forms  C  =  C(At),  and  finally  substitutes  A{  and  C  in  (2.47)  to  obtain  the 
asymptotically  efficient  estimate  A. 


2.3.2  Known  Control  Inputs 


In  this  section  we  construct  estimators  that  exploit  detailed  knowledge  of  the  applied  control 
waveform.  In  particular,  in  the  context  of  A'-periodic  control  inputs  that  are  known  for 
estimation,  we  develop  estimators  that  are  asymptotically  efficient  in  the  sense  that  they 
asymptotically  achieve  (2.23). 

For  IID  Gaussian  sensor  noise,  the  ML  estimate  of  A  from  yN  given  a  control  vector 
wjV,  where  u>[n]  is  a  A'-periodic  sequence  and  JV  is  a  multiple  of  K,  can  be  obtained  as  a 
special  case  of  the  EM  algorithm  presented  in  App.  A.5.  In  particular,  the  EM  algorithm 
takes  the  following  form 


( 


^EM  ~-La 


Uk)  vp 


exp  - 


2  ol 


-exp  - 


(x  -i(fc)  - 


TFT 


“MPi\ 


i+E 


EM'1  ^  J2nN 
l<e<K  v 


Q 


^  X m  1  ~ ^  _  Q  ^  Xm  ~^F.M  ^ 


(2.49a) 


and 


A  ml  =  lim 

fc— Kx> 


i(fc) 

aEM 


(2.49b) 
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where  N  =  N/K,  and  yjV[^]  is  the  N  x  1  vector  comprised  of  the  elements  of  the  iih 
A’-decimated  subsequence,  i.e., 

=  [yM  y[K  +  £\  •  •  •  y[N  -  K  +  £]]  £  =  1,  2,  •  •  • ,  K  .  (2.50) 

Empirical  evidence  suggests  that  the  estimate  resulting  from  the  EM  algorithm  (2.49)  is 
asymptotically  efficient,  i.e.,  it  achieves  the  Cramer-Rao  bound  (2.24)  for  large  enough  N. 

Asymptotically  efficient  estimators  in  the  context  of  nonGaussian  sensor  noises  can  be 
obtained  in  a  fashion  similar  to  those  developed  in  App.  A. 6.  Specifically,  in  the  case  M  —  2, 
we  may  consider  the  vector  A  in  (2.46)  where  we  use  for  A{  the  ML  estimate  of  A  given 
the  2th  A’-decimated  subsequence  from  (2.50),  i.e., 

Ai  =  1A  (Aml  (y^H;  °o)  -  w[i])  i  =  1,  2,  •  *  •  ,  K  (2.51) 

and  where  XA  (•)  and  Aml  (•;  oo)  are  given  by  (2.41)  and  (2.42),  respectively.  The  A,-’s  from 
(2.51)  are  independent  random  variables,  since  for  any  i  ^  j,  y^i]  and  y^jj’]  are  indepen¬ 
dent  random  vectors.  Therefore,  the  corresponding  vector  A  from  (2.46)  is  asymptotically 
Gaussian  (in  terms  of  its  cumulative  distribution),  with  diagonal  covariance  matrix  C/N; 
the  (i,  i)th  entry  of  the  matrix  C  equals  B  (A  +  in[t];  y[i]),  where  B  (A;  y)  is  given  by  (2.12) 
with  a  replaced  by  v.  Consequently,  an  asymptotically  efficient  estimate  is  provided  by  A 
from  (2.47);  the  estimate  covariance  matrix  that  is  used  for  faster  MSE  convergence  to 
the  asymptotic  performance  is  given  by  C  =  C(At)  where  i  is  the  index  that  minimizes 
B  (A,  +  w[i];  y^[*]). 

Asymptotically  efficient  estimators  can  also  be  constructed  for  signal  quantizers  with 
M  >  2  and  known  A-periodic  inputs  in  nonGaussian  sensor  noise.  Specifically,  for  each 
M-ary  subsequence  from  (2.50)  we  may  first  apply  the  algorithm  (2.45)-(2.47)  to 

obtain  Ii  statistically  independent  estimates  of  A.  By  combining  these  K  estimates  in  a 
fashion  similar  to  the  method  used  in  the  case  M  =  2  for  combining  the  estimates  (2.51), 
we  obtain  an  asymptotically  efficient  estimator  of  A  based  on  yN  given  wjV. 

2.3.3  Control  Inputs  in  the  Presence  of  Feedback 

In  Section  2.2.3  we  have  shown  that  the  worst-case  information  loss  of  a  system  composed  of 
a  signal  quantizer  and  an  additive  control  input  is  lower-bounded  by  the  minimum  possible 


information  loss  of  the  same  system  in  the  control-free  case.  In  this  section  we  develop 
control  input  selection  strategies  based  on  past  quantized  output  samples  and  construct 
associated  estimators  which  effectively  achieve  this  bound. 

Feedback  Control  and  Estimation  for  Signal  Quantizers  with  M  —  2 

We  first  examine  the  Gaussian  sensor  noise  scenario  with  M  =  2  in  detail.  As  (2.38)  reveals, 
the  associated  control-free  information  loss  is  minimized  for  u?[n]  =  —A.  Although  this 
control  input  selection  is  not  permissible,  it  suggests  a  viable  control  input  selection  method 
based  on  past  quantized  observations.  Specifically,  if  A[n ]  is  any  consistent  estimator  of  A 
based  on  yn,  a  reasonable  choice  for  the  control  input  sequence  is  as  follows 

w[rt\  =  -A[n  —  1]  .  (2.52) 


Assuming  the  control  sequence  is  selected  according  to  (2.52),  the  ML  estimator  at  time 
n  satisfies 


AmlM  =  arg  max 
|9|<A 


5^  In  Q  (y[m]  (Aml[to  -  1]  -  #))  • 

nr= 1 


In  App.  A.5  we  show  that  in  the  Gaussian  scenario  the  ML  estimate  of  A  based  on  y"  for 
n  =  1,  2,  •  •  ■  can  be  obtained  using  the  following  EM  algorithm, 


( 


■^EM  — 


m=l 


_ (  \ 

exp  I  1 


V 


Q 


(y[m] 


0 


(2.53a) 


) 


initialized  with  [n]  =  Aml[^  —  1]  and  Aml[0]  =  0,  where  for  any  n, 


AmlN  =  lim  A^mH 


k-t  00 


(2.53b) 


Although  empirical  evidence  suggests  that  the  ML  estimator  obtained  by  means  of  the  EM 
algorithm  in  (2.53)  achieves  the  2  dB  information  loss  bound  (2.38)  for  any  A  in  (—A,  A) 
for  a  moderate  number  of  observations,6  it  is  rather  computationally  intensive;  for  any 

6There  are  a  number  of  other  control  input  selection  methods  and  associated  estimators  which  can 
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additional  observed  sample  an  EM  algorithm  has  to  be  employed.  In  addition,  even  though 
the  number  of  iterations  necessary  for  adequate  convergence  of  the  EM  algorithm  appears 
to  be  small  for  large  n,  the  algorithm  may  still  be  impractical. 

We  next  develop  algorithms  that  achieve  the  bound  (2.38)  and  have  the  additional 
advantage  that  they  can  be  implemented  very  efficiently.  These  are  based  on  the  observation 
that  once  the  estimate  A[n]  is  not  changing  significantly  with  n  ( i.e .,  the  changes  are  small  - 

with  respect  to  crv)  we  may  assume  that  A-\-w[n  +  l]  is  in  the  regime  where  the  information 
loss  is  small,  and  a  linear  estimator  can  be  used  that  approaches  the  2  dB  bound  (2.38). 

Specifically,  let  z  =  Q  (A/av)  and  assume  that  \A/crv\  <  0.1.  In  this  regime,  the  truncated 

m- 

power  series  expansion  provides  a  reasonable  approximation  for  Q~x  (z),  i.e., 

Q-l(z)*yJ^(l-2z).  (2.54) 

We  can  use  (2.54)  to  form  a  linear  estimator  as  follows.  Assuming  that  the  estimation  error 
is  inversely  proportional  to  the  measurements  (which  implies  that  the  asymptotic  MSE  loss 
is  not  infinite),  the  estimate  at  time  n  is  given  as  a  weighted  sum  of  the  estimate  at  time 
n  -  1  and  an  estimate  arising  from  using  the  nth  measurement  y[n ]  alone,  i.e.,  * 

AlM  =  - — -  AL[n]  +  -  i[n|y[n]] ,  (2.55) 

n  n 

where  the  estimate  based  on  the  nth  measurement  alone  is  given  by  using  (2.54)  in  (2.43)  * 

(by  setting  aw  to  0),  and  the  fact  that  tn[n]  =  —  Ah[n  —  1],  i.e., 

i[n|y[n]]  =  AL[n  -  1]  +  av  ^ y[n] .  (2.56) 

m 

By  incorporating  (2.56)  in  (2.55)  this  linear  estimator  takes  the  following  iterative  form 

Ah[n]  =  AiXn  -  1]  +  av  ^  .  (2.57) 

In  order  to  obtain  an  algorithm  that  converges  much  faster  than  (2.57)  to  the  2  dB  bound 

approach  arbitrarily  close  to  the  2  dB  bound;  the  systems  developed  in  this  chapter  for  the  case  M  >  2 

and  nonGaussian  noise  are  such  an  example.  However,  the  associated  MSE  of  these  algorithms  converges  _ 

to  the  bound  (2.38)  considerably  slower  than  the  algorithms  of  this  section.  In  fact,  the  number  of  samples 

required  so  that  the  MSE  of  (2.53)  with  w[n]  as  in  (2.52)  effectively  achieves  the  2  dB  bound  (2.38)  increases 

linearly  with  ln(x)- 
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(2.38),  we  employ  the  EM  algorithm  (2.53)  for  n  <  n0  and  the  recursive  algorithm  (2.57) 
for  n  >  n0,  i.e., 


i[n]  =  { 


AmlM  from  (2-53) 


if  n  <n0 
if  n  >  n0 


(2.58) 


where  the  control  input  w[ri\  is  given  by  (2.52)  provided  that  we  substitute  A[n  —  1]  for 
A[n- 1],  and  where  we  also  incorporated  the  dynamic  range  information  by  means  of  1a  (•)• 

Selection  of  an  appropriate  value  for  n0  is  related  to  the  peak  SNR  x-  Since,  in  principle, 
the  larger  the  peak  SNR,  the  longer  (in  terms  of  the  number  of  observations)  it  takes 
A  -  AmlM  to  reach  the  linear  regime  (2.54),  we  consider  the  case  A  »  av.  For  instance, 
assume  we  are  interested  in  selecting  n0  so  that  the  \/MSE  in  A[n0]  is  less  than  a  given 
fraction  of  av  (so  that  the  truncated  series  approximation  is  valid),  for  example  crv/8.  For 
small  enough  n0,  the  maximum  MSE  from  n0  observations  is  roughly  given  as  the  square 
of  A2-n°.  In  summary,  this  crude-MSE  based  rule  of  thumb  for  selecting  n0  reduces  to 
n0  >  log2(A/a„)  +  3. 

The  solid  and  dashed  curves  in  Fig.  2-9  depict  the  MSE  of  the  ML  estimator  obtained  by 
means  of  the  EM  algorithm  in  (2.53),  and  of  the  computationally  efficient  estimator  (2.58) 
with  n0  =  10,  respectively,  based  on  Monte-Carlo  simulations.  The  system  parameters  for 
this  simulation  are  A  =  1,  av  =  0.1,  resulting  in  log2(A/£7v)  «  6.6,  while  A  =  0.4.  In 
both  cases  the  control  sequence  is  selected  according  to  (2.52).  The  lower  and  upper  dotted 
lines  depict  B  (A;  sN)  and  the  right  hand  side  of  (2.37),  respectively.  As  we  can  see  in  this 
figure,  both  estimates  effectively  achieve  the  2  dB  loss  bound  (2.38)  for  a  moderate  number 
of  observations. 

In  terms  of  the  actual  implementation  of  the  estimator  (2.58),  for  a  given  n0  there  are 
2n°  possible  values  of  Aml[«o]-  These  2n°  estimate  values  can  be  precomputed  and  stored 
in  a  lookup  table.  This  results  in  an  appealing  computationally  efficient  implementation, 
whereby  given  n0  or  fewer  observations  the  estimate  is  obtained  from  a  lookup  table,  while 
once  the  number  of  observations  exceeds  n0,  a  recursive  linear  estimator  is  employed.  Since 
n0  grows  logarithmically  with  x>  the  number  of  lookup  table  entries  for  storing  all  possible 
values  of  AmlC^o]  grows  only  linearly  with  peak  SNR  x- 

A  similar  strategy  can  be  used  in  the  context  of  quantizer  systems  using  feedback  in  any 
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Figure  2-9:  MSE  from  Monte-Carlo  simulations  for  AmlM  (solid)  and  A[n]  with  n0  =  10 
(dashed),  based  on  observations  from  a  signal  quantizer  with  M  =  2  exploiting  feedback 
according  to  (2.52).  The  lower  dotted  line  represents  the  Cramer-Rao  bound  for  estimating 
A  based  on  s[n],  while  the  upper  dotted  line  is  the  2  dB  bound  (2.38);  Parameters:  av  =  0.1, 
A  =  1,  and  A  =  0.4. 

sensor  noise.  In  the  general  case  A *  in  (2.35)  may  not  equal  zero.  A  reasonable  extension 
of  the  control  input  selection  method  (2.52)  for  nonzero  A*  is  as  follows 


w[n\  =  A»  -  A[n  —  1] . 


(2.59) 


An  estimator  similar  to  (2.58)  can  be  used  to  estimate  A  in  this  case.  Specifically,  for 
n  <  n0  the  estimator  may  consist  of  a  precomputed  lookup  table,  while  for  n  >  n0  a 
recursive  estimator  resulting  from  a  truncated  series  expansion  of  C~l  ( z )  around  z  =  A* 
can  be  employed,  namely, 


A[n] 


=  lA(A[n- l]+i 


1  y[n]  +  l-2C„(-A.) 


2  p„(-A.) 


) 


In  particular,  if  A,  is  the  median  of  pv  (•),  in  which  case  £(A„)  is  given  by  (2.36),  we  have 
i[n]  =  U  (i[«  -  1]  +  for  n>n°- 


0* 
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In  general,  empirical  evidence  suggests  that  the  MSE  loss  of  these  algorithms  practically 
achieves  the  associated  £(A«)  for  a  moderate  number  of  observations. 

Feedback  Control  and  Estimation  for  Signal  Quantizers  with  M  >  2 

For  the  Gaussian  sensor  noise  scenario,  the  EM  algorithm  (2.53)  can  be  extended  to  F(-) 
with  M  >  2;  the  resulting  algorithm  is  a  special  case  of  the  one  presented  in  App.  A.5. 
Empirical  evidence  suggests  that  it  is  also  asymptotically  efficient.  Assuming  flexibility  in 
selecting  the  thresholds  of  the  M-level  quantizer,  the  corresponding  information  loss  (2.35) 
can  be  obtained  from  Fig.  2-6.  For  instance,  for  the  optimal  selection  of  the  quantizer 
thresholds  for  M  =  6  we  have  A,  =  0;  if  the  control  input  is  selected  according  to  (2.59), 
the  EM  algorithm  in  App.  A.5  yields  a  worst-case  MSE  loss  of  about  0.25  dB.  Similarly  to 
£max,  the  asymptotic  MSE  loss  is  independent  of  av  and  A. 

For  signal  quantizers  with  M  >  2  where  v[n\  is  any  nonGaussian  noise,  we  may  use  the 
following  two  stage  approach  that  effectively  achieves  £(A„).  For  the  first  Ari  observations 
we  may  employ  any  consistent  estimator  Ai[ra]  of  A.  For  instance,  we  may  use  one  of  the 
feedback-based  algorithms  corresponding  to  the  system  M  =  2  by  ignoring  all  but  two  of 
the  M  levels  of  the  quantized  output.  In  the  second  stage,  we  fix  w[n]  =  A,  —  AifA^]  for 
all  n  >  A7!.  The  number  Ari  determines  the  accuracy  of  the  approximation 

£(A»  +  A-  AM)  «£(A.)  . 

For  any  given  n  >  Ni,  we  can  then  obtain  an  estimate  A2M  of  A  from 

[y[A^i  + 1]  yfATx  +  2]  •••  y[n]]  , 

by  means  of  (2.45)-(2.47),  which  is  asymptotically  efficient  with  respect  to  £  ^A,  +  A  —  AjfA'i 
For  faster  convergence,  the  overall  estimate  can  be  a  weighed  sum  of  the  estimates  Aj[ATi] 
and  A2[n].  Although  the  associated  asymptotic  MSE  loss  can  be  made  to  approach  arbi¬ 
trarily  close  to  £(A*),  these  algorithms  typically  require  significantly  larger  data  sets  to 
effectively  achieve  the  desired  information  loss,  as  compared  to  the  algorithms  for  M  =  2 
of  the  previous  section. 
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Chapter  3 


Static  Case  Extensions  for 
Quantizer  Bias  Control  Systems 


In  a  number  of  applications  involving  estimation  of  slowly-varying  information-bearing  sig¬ 
nals,  data  may  be  collected  from  multiple  sensors.  In  this  case,  the  acquired  measurements 
must  be  efficiently  encoded  at  each  sensor  and,  in  turn,  these  encoded  streams  must  be  effec¬ 
tively  combined  at  the  host  to  obtain  accurate  signal  estimates.  In  addition,  irrespective  of 
whether  the  application  involves  one  or  multiple  sensors,  a  number  of  other  issues  may  arise 
and  may  thus  have  to  be  taken  into  consideration.  For  instance,  we  often  have  available 
accurate  signal  models  or  other  forms  of  prior  information  about  the  information-bearing 
signal.  In  such  cases,  we  would  be  interested  in  exploiting  any  such  form  of  additional  in¬ 
formation  to  improve  the  quality  of  the  encodings  and  the  associated  estimates  from  these 
encodings.  In  addition,  there  are  many  instances  where  the  noise  at  each  sensor  is  non¬ 
stationary,  or  its  statistical  characterization  is  only  partially  known.  It  is  important  to 
incorporate  such  forms  of  uncertainty  in  the  encoding  and  estimation  algorithms  and  to 
determine  the  extent  to  which  such  issues  may  affect  the  overall  system  performance. 

In  this  chapter  we  develop  a  number  of  such  extensions  of  the  systems  we  examined 
in  Chapter  2.  These  extensions  address  a  representative  collection  of  cases  of  the  signal 
estimation  problem  from  digitally  encoded  measurements  that  may  arise  in  practice.  In 
Section  3.1  we  consider  estimation  of  a  static  signal  from  digitally  encoded  data  obtained 
from  multiple  sensors  and  develop  multi-sensor  extensions  of  the  signal  encoding  strategies 
and  the  algorithms  of  Chapter  2.  As  we  show,  the  performance  and  optimal  design  of  the 
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encoding  and  estimation  algorithms  of  many  multi-sensor  generalizations  of  the  estimation 
problem  of  Chapter  2  are  natural  extensions  of  the  associate  single-sensor  performance  and 
algorithms. 

In  Section  3.2  we  consider  the  case  where  we  have  available  a  priori  information  about 
the  relative  likelihood  of  values  of  the  information-bearing  signal  and  would  like  to  exploit 
it  so  as  to  improve  the  estimate  quality.  For  instance,  spatial  or  temporal  correlations  of 
the  information-bearing  signal  are  often  available  and  can  often  be  used  to  obtain  such  an 
a  priori  description  of  the  static  parameter.  As  we  show,  such  a  priori  information  can  be 
naturally  incorporated  in  the  signal  encoding  and  estimation  algorithms  by  using  average 
rather  than  worst-case  performance  metrics  to  design  these  systems. 

Finally,  in  Section  3.3  we  examine  another  important  extension  of  the  static-case  esti¬ 
mation  problem  of  Chapter  2  where,  in  addition  to  the  signal  parameter  of  interest,  the 
sensor  noise  power  level  is  unknown.  We  show  how  the  performance  measures  and  the  as¬ 
sociated  systems  we  developed  in  Chapter  2  can  be  extended  to  encompass  this  important 
case.  In  the  context  of  all  these  extensions  we  will  focus  our  attention  on  the  special  case 
that  the  sensor  noise  is  IID  zero-mean  Gaussian,  although  in  most  cases  our  results  can  be 
generalized  to  a  much  broader  class  of  nonGaussian  sensor  noises. 

3.1  Multiple  Sensors 

In  this  section  we  examine  a  network  generalization  of  the  single-sensor  problem  stated  in 
(2.1)-(2.2),  namely,  estimating  an  unknown  parameter  A  from  observation  of 

yt[n]  =  Fe(A  +  v*[n]  +  wt[n])  n  =  1,  2,  •  •  •  ,  N,  £=1,2,  -,L  (3.1) 

where  Fe(-)  is  a  M^-quantizer  of  the  form  (2.2)  with  thresholds  X^i,  •  •  •  ,  the  u;[n]’s 

are  IID  processes,  and  the  tuj[n]’s  denote  the  applied  control  input  sequences.  We  use  for 
convenience  YN  to  denote  the  following  (Ar  1)  x  1  vector  of  N  encoded  observations  from 
each  of  the  L  sensors 

Y"=[[ynT  [y?]T  [y£r]T]T-  <3-2> 

Networks  employing  encodings  in  the  form  of  (3.1)  provide  attractive  models  for  a 
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Central  processing  unit 


A[n] 

Low-bandwidth  signal 


Figure  3-1:  Block  diagram  of  a  network  of  distributed  signal  quantizers  using  feedback  in 
the  context  of  signal  estimation. 

number  of  distributed  sensor  networks.  In  Fig.  3-1,  for  instance,  we  show  the  block  diagram 
of  a  special  case  of  such  a  distributed  estimation  network  which  employs  feedback  in  the 
selection  of  the  control  inputs.  In  this  section,  we  consider  distributed  estimation  networks 
with  and  without  feedback. 

3.1.1  Statistically  Independent  Sensor  Noises 

In  the  case  that  the  sensor  noise  processes  Of[n]  in  (3.1)  are  statistically  independent, 
straightforward  extensions  of  the  single-sensor  systems  developed  in  Chapters  2  yield  net¬ 
work  generalizations.  In  particular,  these  networks  can  be  analyzed  by  means  of  the  tools 
developed  for  the  single-sensor  case. 

For  the  remainder  of  this  section  we  restrict  our  attention  to  IID  Gaussian  sensor  noise, 
which  we  use  as  a  representative  example  to  illustrate  the  extensions  of  the  single-sensor 
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results  to  the  associated  multi-sensor  settings.  Analogous  extensions  can  be  similarly  derived 
for  all  the  other  scenarios  we  developed  in  Sections  2.2-2.3. 


4*»' 


Pseudo-noise  Control  Inputs 

We  may  consider  a  network  of  sensors  employing  encodings  of  the  form  (3.1)  for  which  the 
control  inputs  are  IID  pseudo-noise  sequences  with  known  statistical  description  W([n]  ~ 
cr^),  and  which  can  be  adequately  modeled  as  statistically  independent  of  one  another 
and  of  the  sensor  noises.  We  will  consider  two  cases  which  differ  in  terms  of  whether  the 
sensor  noise  levels  and  the  quantizers  are  identical  or  different. 

In  the  case  that  all  quantizers  are  identical,  i.e.,  Fe(x)  =  F(x)  for  all  x  and  the  sensor 
noises  have  equal  strength,  i.e.,  <rV(  =  av  in  (3.1),  the  collection  of  L  observation  vectors 
{yf7}  can  be  viewed  as  a  single  (N I)  x  1  observation  vector  YN  collected  from  a  single 
sensor.  Hence,  in  this  case  all  the  analysis  of  Sections  2.2.1  and  2.3.1  applies  intact.  For 
instance,  in  the  special  case  Me  —  M  =  2,  the  optimal  noise  level  is  given  by  aW(  =  <r°,pt 
from  (2.18)  and  the  associated  ML  estimator  is  given  by  (2.43)  where  the  N  x  1  observation 
vector  yN  is  replaced  with  the  (N  L)  x  1  vector  YN,  i.e., 

Aml  (Y^oo)  =  (Ta  Q-1  •  (3-3) 


In  the  general  case  where  the  quantizers  are  distinct  and  the  overall  noise  levels  (sum¬ 
marizing  the  effects  of  the  sensor  noise  and  the  pseudo-noise  component)  have  different 
strengths,  Cramer-Rao  bounds  and  corresponding  ML  estimators  can  be  formed  with  mi¬ 
nor  modifications  of  the  single-sensor  problem.  Specifically,  the  optimal  pseudo-noise  power 
level  aWe  to  be  used  at  the  £th  sensor  can  be  selected  so  as  to  optimize  the  single-sensor 
performance;  for  instance  for  any  l  for  which  Me  =  2,  the  optimal  aWt  is  given  by  (2.18) 
with  av  replaced  by  aVl.  Similarly,  the  ML  estimator  of  A  from  observation  of  YN  is  given 
by  the  following  extension  of  the  single-sensor  EM  algorithm 


/ 


i(*+1)  —  r. 

^EM  ~ 


L  Mt 
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Oat  KyM) 


exp 


exp 


2ol. 
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/ 

(3.4) 


where 


&at  \J &vi  ^wi  • 

As  a  direct  extension  of  the  single-sensor  results  discussed  in  Chapter  2,  at  high  peak 
SNR  ( i.e .,  A  aV(  for  l  =  1,  2,  •  •  •  ,  L)  by  selecting  the  pseudo-noise  levels  as  aW(  ~  A  A, 
the  worst-case  information  loss  can  be  made  to  grow  as  slow  as  the  square  of  the  parameter 
dynamic  range  A,  for  any  network  of  fixed  size  L  with  a  fixed  set  of  quantizers,  and  sensor 
noise  components  with  fixed  statistical  characterization. 

Known  Control  Inputs 

Similarly,  in  the  case  that  the  control  inputs  we[n]  in  (3.1)  are  known  for  estimation  we 
can  easily  extend  the  associated  encoding  strategies  and  estimation  algorithms  so  as  to 
achieve  optimal  performance  in  terms  of  minimizing  the  worst-case  information  loss  rate. 
Specifically,  we  may  design  the  encoding  strategy  used  at  each  sensor  separately  viewing 
it  as  a  single-sensor  system.  If,  for  instance,  Me  =  M  =  2,  we  can  select  the  control 
input  sequence  at  the  ith.  sensor  according  to  (2.26)  where  Ke  is  given  by  (2.27)  where  x  is 
replaced  by 

Xt=£~,  (3-5) 

X  =  2fd00,  and  is  given  by  (2.30). 

The  performance  of  networks  of  sensors  in  the  context  of  known  control  inputs  is  a 
natural  extension  of  the  associated  single  sensor  performance;  for  a  fixed  network  size,  with 
fixed  quantizers  and  sensor  noise  PDFs,  the  worst-case  information  loss  can  be  made  to 
grow  linearly  with  the  signal  dynamic  range  A  by  means  of  the  encoding  scheme  described 
by  (3.5). 

Natural  extensions  of  the  EM  algorithm  (2.49)  can  be  used  to  perform  efficient  data 
fusion  and  signal  estimation  in  the  multi-sensor  case.  In  particular,  assuming  that  a  Kf 
periodic  sequence  is  used  as  the  bias  of  the  £th  quantizer  Fe(-)  in  (3.1),  we  have 
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(3.6a) 


and 


Aml  —  ^em  >  (3.6b) 

k-+  co 

where  Ne  =  [N/Iig\ ,  y^[i]  is  the  iV*  x  1  vector  comprised  of  the  elements  of  the  &h 
AVdecimated  subsequence,  i.e., 


yf'M  =  I  yt[i\  ye[K(  +  *]  •  •  •  ye[N  -  Ke  +  *] 


1,2, 


Kt 


(3.6c) 


and 


z^[m,  i]  = 


Y  jiW 

At,m  —  ^em 


we[i] 


'vt 


(3.6d) 


If  all  the  quantizers  are  identical,  i.e.,  Fg(-)  =  F(-),  and  all  the  sensor  noises  have 
identical  PDFs,  the  above  encoding  design  method  results  in  selecting  the  same  control 
input  sequence  for  each  sensor,  i.e.,  wg[ri\  —  w[n ]  from  (2.26)  where  K  is  given  by  (2.27). 
When  in  addition  L  »  K,  or  when  L  is  an  integer  multiple  of  K,  the  encoding  strategy 
can  be  simplified  even  further  by  spatially  distributing  the  K  possible  control  input  values. 
Specifically,  consider  for  simplicity  the  case  where  n  —  L/K  is  an  integer.  By  dividing  the 
L  sensors  into  K  groups  of  n  sensors,  and  by  setting  the  control  input  of  any  sensor  within 
a  given  group  equal  to  one  of  the  K  distinct  samples  of  the  A'-periodic  sequence  (2.26)  for 
all  n,  we  can  achieve  optimal  encoding  performance  without  the  need  for  a  time-varying 
control  input. 


Control  Inputs  in  the  Presence  of  Feedback 

Networks  exploiting  feedback  from  the  host  (observing  the  quantized  outputs)  to  the  sen¬ 
sors  (using  quantizer  bias  control)  can  also  be  analyzed  using  the  associated  single-sensor 
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principles.  As  a  natural  extension  of  the  single  sensor  results,  at  each  sensor  our  objective 
is  to  operate  around  the  point  where  the  information  loss  is  minimized;  in  the  case  M  —  2 
for  instance,  performance  is  optimized  when  we  operate  in  the  vicinity  of  the  quantizer 
threshold.  As  a  generalization  of  the  single-sensor  analysis  for  quantizer  bias  control  via 
feedback,  the  control  input  can  be  selected  using  (2.52),  where  A[n  —  1]  denotes  the  estimate 
of  A  based  on  observations  collected  from  all  L  sensors  up  to  and  including  time  n  —  1.  For 
instance,  in  the  case  M  =  2,  the  multi-sensor  extension  of  the  ML  estimator  (2.53)  is  given 
by 


‘EM 


[ra]  =  XA 


(  (^ML  [m  — 1]— -^EM  I”])  ^  \ 

exP  l  2 ^ 


n  L  ^  \  2  I 

4mM  +  EE  WH  F7ZnT  - X  r'  n . >),A 

V  2  JT  n  L  Q 


(3.7a) 


initialized  with  iL°A[n]  =  Aml —  1]  and  Aml[0]  =  0,  where  for  any  n, 


AmlN  =  fclimiglH  • 


(3.7b) 


The  associated  multi-sensor  extension  of  (2.58)  is  similarly  given  by 


A[n]  = 


AmlM  from  (3.7) 


2a 


^Ajn  -  1]  + 


p  E<=i  *vt 

V2  nZf=i<r-2  J 


if  n  <  n0 
if  n  >  n0 


(3.8) 


Fig.  3-2  depicts  the  MSE  performance  of  the  ML  estimator  (3.7)  and  A[iV]  given  by  (3.8) 
for  a  network  of  L  =  5  sensors.  As  in  the  single-sensor  case,  the  MSEs  of  both  estimators 
practically  achieve  the  associated  Cramer-Rao  bound  corresponding  to  a  2  dB  information 
loss  for  moderate  N.  In  general,  spatial  redundancy  (large  L)  leads  to  faster  convergence  to 
the  associated  2  dB  bound.  This  fact  is  exploited  in  Chapter  5  where  we  develop  encodings 
for  sensor  networks  used  to  estimate  fast  time-varying  signals. 

In  the  Gaussian  scenario,  for  networks  of  sensors  encoding  1  bit  of  information  per 
measurement  and  employing  quantizer  bias  control  with  feedback,  the  associated  informa¬ 
tion  loss  can  be  directly  obtained  using  appropriate  interpretation  of  Fig.  2-7  describing 
the  single-sensor  case.  Similar  extensions  of  the  associated  single-sensor  problem  can  be 
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Figure  3-2:  MSE  for  Aml[-N]  and  A[iV]  for  a  network  of  L  —  5  two-level  quantizers,  using 
feedback  in  the  selection  of  the  control  input,  and  associated  Cramer-Rao  bounds  (see  also 
caption  of  Fig.  2-9).  The  sensor  noise  levels  are  0.08,  0.08  0.08,  0.2,  and  0.4,  while  A  =  0.4 
and  A  =  1. 


obtained  for  any  set  of  sensor  noises  for  M  >  2.  For  instance,  if  feedback  is  available  and 
properly  used  in  the  multi-sensor  setting  shown  in  Fig.  3-1,  a  small  worst-case  information 
(and  MSE)  loss  can  be  achieved,  independent  of  the  dynamic  range  and  the  noise  power 
levels.  This  small  information  loss,  however,  will  in  general  depend  on  both  the  quantizers 
Fg(-)  and  the  sensor  noise  PDFs. 


3.1.2  Perfectly  Correlated  Sensor  Noises 

In  this  section  we  consider  an  example  involving  sensor  noises  that  are  spatially  correlated. 
In  particular,  we  consider  the  case  where  the  concurrent  sensor  noise  samples  are  spatially 
perfectly  correlated,  i.e.,  V([n]  =  u[n]  for  1  <  t  <  L  and  where  v[n]  is  an  IID  sequence. 

This  model  may  be  naturally  suited  for  distributed  estimation  settings  in  which  there 
is  an  additive  distortion  component  that  is  identical  at  a  number  of  distinct  sensors.  In 
addition,  this  model  may  arise  in  a  variety  of  other  applications  involving  estimation  from 
coarsely  digitized  measurements;  in  the  context  of  analog  to  digital  conversion  of  noisy 
signals  for  instance,  this  model  may  provide  a  reasonably  accurate  representation  of  the 
noisy  analog  signal  that  is  to  be  digitized  by  each  element  in  an  A/D  converter  array  of 
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inexpensive  components. 

For  such  systems,  the  analysis  in  the  presence  of  known  periodic  control  inputs,  or 
of  control  inputs  selected  using  feedback  information,  naturally  decouples  to  that  of  the 
associated  single-sensor  problems  we  have  already  considered  in  Chapter  2.  For  instance, 
a  network  of  L  binary  quantizers  where  the  control  inputs  used  are  known  for  estimation, 
is  equivalent  to  a  single  L  +  1-level  sensor  with  known  time-varying  thresholds. 

Henceforth,  we  focus  on  the  special  case  where  the  control  inputs  W([n]  correspond  to 
pseudo-noise  sequences  that  are  well  modeled  as  independent  IID  Gaussian  sequences,  each 
with  variance  a^,,  and  focus  on  the  case  M  —  2. 

Motivated  by  the  form  of  the  estimator  (3.3)  we  focus  on  the  following  elementary 
estimators  of  A 


A  (Yn)  =  1A  (-a*  Q-1  (Y^))) 


where  YN  is  given  by  (3.2)  and 


ki  {YN)  = 


1C !  ( Yn ) 
NL 


1 

NL 


N  L 

EE^ni  • 


n=l  &=l 


(3.9a) 


(3.9b) 


We  will  mainly  focus  on  the  case  where  av  A,  which  corresponds  to  significant  worst-case 
information  loss  in  the  case  L  =  1  examined  in  Chapter  2,  even  when  the  pseudo-noise  level 
is  optimally  selected  (see  dash-dot  curve  in  Fig.  2-7  for  large  A/crv)  .  We  can  show  by 
methods  very  similar  to  those  used  in  App.  A.4,  that  for  large  N  and  L,  the  MSE  of  the 
estimator  in  (3.9)  is  reasonably  approximated  as  follows 


(a-a(y")): 


1  „  ..  ,  L  —  1 


T](A,  <Jv, 


(3.10) 


where  (A;  <rQ )  is  given  by  (2.16),  and 

^?(A,  <7-u,  crw}  —  p(A,av,aw)  Q  ^  ^  , 


where 


= 2^  £1 Q (-^r)  6xp(-r)  iv 
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Figure  3-3:  Estimation  in  the  presence  of  perfectly  correlated  sensor  noise  components.  The 
pseudo-noise  sequences  Wi[n]  for  *  =  1, 2,  •  •  • ,  L  are  modeled  as  independent  IID  Gaussian 
noise  sources,  independent  of  v[n],  with  aw  =  0.6.  The  solid  (dashed)  curve  corresponds  to 
the  predicted  MSE  loss,  while  the  “o”  (“*”)  marks  depict  the  MSE  Loss  from  Monte-Carlo 
simulations  for  the  estimator  (3.9)  for  A  =  0.6  and  av  =  0.02  (<r„  =  0.1). 

In  Fig.  3-3  we  present  the  results  of  Monte-Carlo  simulations  on  the  MSE  loss  of  the  es¬ 
timator  (3.9)  for  two  representative  sensor  noise  levels.  The  solid  (dashed)  curve  depicts 
the  MSE  estimate  (3.10)  for  A  =  0.6,  av  —  0.02  ( av  =  0.1),  while  the  “o”  (“*”)  symbols 
depict  the  associated  simulated  MSE  from  Monte-Carlo  simulations.  The  pseudo-noise  level 
used  at  all  sensors  was  aw  —  0.6,  while  the  signal  dynamic  range  is  A  =  1.  As  the  figure 
illustrates,  (3.10)  predicts  the  MSE  loss  fairly  accurately  for  large  L  in  these  two  examples. 


Eqn.  (3.10)  suggests  a  method  for  obtaining  an  approximate  value  for  the  minimum 
number  of  sensors  Lm\n  that  is  required  in  order  to  achieve  performance  within  C  dB  of 
the  unconstrained  performance  cr^/N ,  for  <C  A,  by  means  of  the  estimator  (3.9).  In  this 
parameter  regime,  Bjy  (A;  ctq)  r/(A,  av,  aw)  and  aw  «  cra,  which  together  imply  that 


r  _2  _  (A;  crw) 

-^min  ~  IQ  C/10  _  1  • 


(3.11) 


In  Fig.  3-4  we  present  the  network  size  Lmm 


required  to  achieve  MSE  performance  within 
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Figure  3-4:  Minimum  network  size  Lm in  required  for  reaching  within  2  dB  (solid  curve)  and 

1  dB  (dashed  curve)  of  the  infinite-resolution  MSE,  as  predicted  by  (3.11).  The  “o”  and 

marks  depict  the  required  Zmin  according  to  (3.10)  for  avj A  =  0.02  and  avj A  =  0.1, 
respectively. 

2  dB  (solid  curve)  and  1  dB  (dashed  curve)  of  the  best  performance  based  on  the  original 
infinite-resolution  noisy  measurements  (i.e.,  B  (A;  SN)  =  a %/N),  as  a  function  of  <rw/A 
according  to  (3.11).  The  “o”  and  marks  depict  the  required  network  size  Lmm  by 
means  of  (3.10)  for  <tv/A  =  0.02  and  av/A  =  0.1,  respectively.  Note  for  instance  that,  if 
A  =  1,  then  for  aw  =  0.6  and  av  —  0.1,  a  network  of  size  L  «  280  is  needed  to  achieve  the 
infinite-resolution  bound  within  2  dB,  while  a  network  of  size  L  ~  635  reaches  within  1  dB 
of  the  infinite-resolution  performance. 


3.2  Incorporation  of  Prior  Information 

In  a  number  of  applications  involving  estimation  from  digital  encodings  of  the  form  of 
quantizer  bias  control,  we  may  have  prior  information  about  the  relative  likelihood  of  various 
values  of  the  information-bearing  signal,  which  in  the  static  case  is  a  single  parameter 
A.  Such  a  priori  information  can  arise  from  a  variety  of  sources,  such  as  the  underlying 
mechanisms  that  generate  the  information-bearing  signal.  As  we  will  see  in  Chapter  5, 
temporal  correlations  in  the  information  bearing  signal  can  often  be  exploited  in  the  form 
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of  a  priori  information. 

In  all  these  cases  an  average  rather  that  worst-case  performance  metric  is  more  naturally 
suited  for  system  analysis  and  design.  As  we  show  in  this  section,  we  can  design  encodings 
based  on  quantizer  bias  control  for  which  the  average  information  loss  rates  exhibit  strikingly 
similar  behavior  to  the  associated  worst-case  information  loss  rates  developed  for  unknown 
parameter  estimation.  Specifically,  the  quality  of  the  encoding  is  characterized  by  the 
average  encoding  performance  given  by 

B  ( A ;  y)  =  E[B  (A;  y)]  =  f  B  (A;  y )  pA  ( A )  dA  .  (3.12) 

Ja 

and  the  average  information  loss  given  by 

C  (pA  (•))  =  E(C(A)]  =  [  C(A)Pa  (A)  dA  ,  (3.13) 

JA 

where  C(A)  is  given  by  (2.4),  and  B(A\  y)  is  the  Cramer- Rao  bound  for  estimating  an 
unknown  parameter  A  based  on  any  one  sample  of  the  IID  sequence  y[n].  Since  the  best 
possible  MSE  performance  based  on  the  uncoded  set  of  observations  s[n]  satisfies 

B(A;  s)  =  B(A ;  s)  , 

the  average  information  loss  (3.13)  and  average  Cramer-Rao  bound  (3.12)  can  be  used 
interchangeably  as  measures  of  performance. 

The  metrics  (3.12)  and  (3.13)  are  reasonable  performance  metrics  for  assessing  the  en¬ 
coding  performance  for  a  large  number  of  observations,  and,  in  particular,  as  N  -»  oo. 
Specifically,  if  N  is  large  enough  so  that  the  information  due  to  the  encodings  yN  domi¬ 
nates  the  information  from  the  prior  information,  (3.12)  and  (3.13)  represent  the  MSE  limits 
for  estimation  based  on  the  encodings  only,  averaged  with  respect  to  the  prior  pA  (•).  At 
the  other  extreme  where  iV  =  0,  there  is  no  information  loss  in  terms  of  using  the  encodings 
instead  of  the  original  data:  in  both  cases  the  only  information  available  is  due  to  the  prior 
information.  In  general,  the  larger  N  the  more  the  information  due  to  the  encodings,  and 
thus  the  larger  the  information  loss.  Thus,  for  small  finite  N  the  information  loss  due  to 
the  encodings  is  in  general  less  than  (3.13). 

We  next  consider  two  representative  forms  of  a  priori  information.  First,  we  consider 


the  case  where  the  random  variable  A  is  uniformly  distributed  within  the  range  (—A,  A). 
Consequently,  we  consider  a  case  where  the  parameter  A  is  a  Gaussian  random  variable 
(and  thus  is  not  range-limited). 

3.2.1  Uniformly  Distributed  Signal 

In  this  section  we  develop  extensions  of  the  encoding  and  estimation  algorithms  of  Chapter  2 
where  the  objective  is  to  optimize  average  rather  than  worst-case  performance  over  the 
signal  dynamic  range.  It  is  often  reasonable  to  assume  that  the  random  variable  A  is  a 
priori  uniformly  distributed  in  (—A,  A). 

Estimation  Algorithms 

Analogously  to  the  ML  estimate  for  unknown  parameters,  we  may  consider  the  maximum 
a  posteriori  (MAP)  estimate  of  the  random  variable  A  given  yN  [31],  namely 

Amap  (y*)  =  argmax  [in  (pyN|A  (y^|0))  +  ln(p,i  (0))]  . 

Due  to  the  particular  form  of  the  prior  pa  (•)>  f°r  any  type  of  encodings  generated  via 
quantizer  bias  control,  the  MAP  estimate  is  identical  the  associated  ML  estimate  with  range 
restriction  in  (A,  A)  developed  in  Chapter  2.  Consequently,  for  all  encoding  scenarios  of 
the  form  of  quantizer  bias  control  that  we  have  considered  in  Chapter  2,  the  associated 
estimation  algorithms  we  have  developed  in  Chapter  2  are  asymptotically  optimal,  in  the 
sense  that  they  asymptotically  achieve  the  associated  average  information  loss  (3.12)  of 
the  encodings.  Consequently,  we  simply  need  to  redesign  the  encoding  strategies  having  in 
mind  that  we  now  need  to  optimize  over  average  rather  than  worst-case  information  loss 
performance. 

Pseudo-noise  inputs 

We  first  consider  the  estimation  problem  (2.1)  with  F(-)  given  by  (2.2),  where  the  control 
input  is  a  pseudo-noise  sequence.  We  assume  that  w[ri\  and  u[n]  are  independent  IID  zero- 
mean  Gaussian  sequences  with  variance  <7^,  and  o\  respectively,  and  independent  of  A. 
We  wish  to  select  the  pseudo-noise  level  aw  so  as  to  minimize  the  average  information  loss 
(3.13).  For  convenience,  we  use  B  (A,  <rv,  ow)  to  denote  (3.12)  for  a  given  A,  crv,  and 
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crw,  and  B  (A,  crQ)  to  denote  (3.12)  for  a  given  A  and  aQ  =  yjal  +  <r^.  Similarly,  we  let 
£( A,  av,  cw)  denote  (3.13)  for  a  given  A,  av,  and  crw,  and  £  (A,  cra )  denote  (3.13)  for  a 
given  A  and  oa  =  y/arf+ lr*. 

For  any  admissible  sensor  noise  distribution,  let  cropt(A)  denote  the  sensor  noise  level 
that  minimizes  the  average  encoding  performance  (3.12),  i.e., 

aopt(A)  =  argmin  B{ A,  a)  .  (3-14) 

<7 

In  a  manner  analogous  to  the  treatment  of  the  unknown  parameter  case  we  can  show  that 
<r° p1(A)  >  0  for  any  A  >  0.  In  particular, 

eropt(A)  =  argmin  B  (A,  cr)  =  A  argmin  E[B  {A/ A;  crv/A)]  =  <ropt(l)  A  ,  (3.15) 

i 7  a 

which  also  implies  that 


B  (A,  cropt(A))  =  A2  B  (1,  cropt(l))  .  (3.16) 

Fig.  3-5  depicts  B  (A,  crv)  /A2  as  a  function  of  cr„/A  for  A  being  uniformly  distributed.  As 
the  figure  reveals,  <ropt(A)  >  0  for  A  >  0.  In  particular,  numerical  evaluation  of  (3.14)  for 
A  =  1  together  with  (3.15)  yields 

aopt(A)  =  cropt(l)  A  w  0.4622  A  .  (3.17) 


The  existence  of  a  nonzero  optimal  noise  level  in  terms  of  minimizing  the  average  MSE 
performance  of  the  encoding  for  any  given  A  can  be  exploited  to  provide  encoding  per¬ 
formance  benefits  by  means  of  pseudo-noise  bias  control.  Specifically,  by  choosing  the 
pseudo-noise  power  level  as 


_opt 


(A) 


0<r°P*(A)]2  -  al  if  <ropt(A)  >  av 
0  otherwise 


(3.18) 


where  cropt(A)  is  given  by  (3.17),  the  average  encoding  performance  is  given  by  (3.16)  at 
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Figure  3-5:  B  (A,  a)  /A2  as  a  function  of  avj A  when  A  is  a  priori  uniformly  distributed  in 
(—A,  A). 


high  SNR  x  defined  as  x  =  A/a,,,  which,  in  conjunction  with  (2.8),  gives 


Cpn(x)=C  (l,^opt(l))  X2 


(3.19) 


for  large  enough  x •  For  comparison,  consider  the  case  where  the  control  input  is  set  to  zero 
and  where  the  sensor  noise  is  Gaussian.  The  average  information  loss  in  this  case  is  given 

by 


^(A;a„) 


Q  (£) Q  (-£) 

exp  (x2/2) 


V^X3 


where  the  approximation  holds  for  large  x ■  Combining  the  above  approximation  with  (2.8) 
we  obtain  the  average  information  loss  in  Gaussian  noise  in  the  absence  of  a  control  input, 
namely, 


£fre'(x)“^?exp(xV2)’ 

which  grows  at  a  rate  much  faster  that  the  optimal  pseudo-noise  case  in  (3.19).  Again,  we 
can  easily  show  by  extending  the  proof  of  the  unknown  parameter  case  that  pseudo-noise 
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yields  performance  benefits  for  any  admissible  sensor  noise  PDF  and  any  quantizer  with 
M  >  2. 

Known  Control  Inputs  and  Control  Inputs  via  Feedback 

Similar  extensions  of  the  unknown  noise  counterparts  can  be  developed  for  known  control 
inputs;  we  may  use  periodic  control  inputs  of  the  form  (2.26),  where  by  choosing  K  according 
to  (2.26)  we  can  again  achieve  a  linear  growth  rate  for  the  average  information  loss  as  a 
function  of  x- 

Finally,  in  the  presence  of  feedback  the  encoding  and  estimation  strategies  used  in 
Chapter  2  achieve  the  2  dB  loss  for  all  parameter  values,  which  implies  that  worst-case  and 
average  performance  are  in  this  case  identical. 

Similar  behavior  is  exhibited  by  other  a  priori  PDFs.  We  next  describe  encoding  strate¬ 
gies  and  estimation  algorithms  in  the  case  that  the  random  variable  A  is  Gaussian  and 
where  the  sensor  noise  is  also  Gaussian. 

3.2.2  Normally  Distributed  Signal 

In  a  number  of  practical  scenarios  the  information-bearing  signal  is  not  range-limited,  that 
is,  a  uniformly  distributed  PDF  description  fails  to  provide  an  accurate  description  of  the 
a  priori  parameter  characterization.  Often,  it  is  reasonable  to  assume  that  A  is  a  priori 
normally  distributed  with  mean  m^  and  power  level  o\.  For  instance,  this  is  a  naturally 
suited  a  priori  signal  description  in  cases  where  the  random  parameter  denotes  the  overall 
effect  of  large  collections  of  finite  power  events.  As  we  will  show,  there  is  also  a  natural 
measure  of  signal-to-noise  ratio  in  the  design  and  performance  evaluation  of  these  systems. 
Again  we  focus  on  the  case  that  the  sensor  noise  PDF  is  Gaussian. 

Pseudo-Noise  Control  Inputs 

We  first  consider  the  estimation  problem  (2.1)  with  F(-)  given  by  (2.2)  in  the  case  that  the 
control  input  is  a  pseudo-noise  sequence.  We  assume  that  w[n ]  and  u[n]  are  independent 
IID  zero-mean  and  normally  distributed  sequences  with  variance  (T%,  and  a £  respectively, 
and  independent  of  the  Gaussian  random  variable  A.  We  wish  to  select  the  pseudo-noise 
power  level  aw  so  as  to  minimize  the  average  information  loss  (3.13).  For  illustration,  we 
focus  on  the  case  M  =  2. 


Figure  3-6:  B  (oA-,  o)  j  a\  as  a,  function  of  avj  a  a  when  A  is  a  priori  zero-mean  and  normally 
distributed  with  variance  o\. 

First  we  consider  the  special  case  where  A  is  zero-mean.  For  convenience,  we  use 
B  a™)  denote  (3.12)  for  a  given  oAi  ov  and  aw,  and  B  (oA,  oq)  denote  (3.12)  for 

a  given  oA  and  oa  =  y/ol  +  tr*.  Similarly,  let  £  ( aA ,  ov,  ow)  denote  (3.13)  for  a  given  oA, 
0V,  and  (Tyj,  and  C(oA,  oq )  denote  (3.13)  for  a  given  oA  and  oQ  —  yj o'*  Following 
the  analysis  of  Section  3.2.1  we  can  show  that  for  any  given  signal  power  level  oA  >  0,  there 
is  an  optimal  aggregate  noise  level  oopt  in  terms  of  minimizing  the  average  encoding  loss 
(3.12).  In  particular,  similar  to  (3.15)  we  have 

cropt(oA)  =  cropVi)  o a  «  1.1395  (3.20) 

where  ffopVi)  has  been  numerically  computed.  Fig.  3-6  depicts  B  (oA,  ov)  jo\  as  a  func¬ 
tion  of  ov/oA,  where  A  ~  A'r(0,  o\).  As  the  figure  reveals,  <7°pt(<rx)  >  0  for  oA  >  0. 
Eqn.  (3.20)  also  implies  that 

B  ( oA ,  oopt(crA))  =o\B  (1,  o°pt(  1))  .  (3.21) 

We  can  exploit  (3.21)  to  show  that  the  average  information  loss  (3.13)  can  be  made  to  grow 
as  slow  as  quadratically  with  an  appropriately  defined  measure  of  SNR;  by  choosing  the 
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pseudo-noise  power  level  as 


•fw  = 


-  Tl  if  t7opt(<TA)  >  av 
0  otherwise 


(3.22) 


where  oopt{aA )  is  given  by  (3.20),  the  average  encoding  performance  is  given  by  (3.21) 
which,  in  conjunction  with  (2.8),  and  by  letting  x  =  aAlav  gives 


£pn(x)=£  (1,  cropt(l))  x2  (3-23) 

for  x  >  l/cr°Pt(i).  For  comparison,  performance  degrades  rapidly  at  high  SNR  x  if  w[n\  =  0; 

®  =  \  I  Q(n)f®Jfv)  '(!)*■•  < 3-24> 

which  is  finite1  only  for  X  <  1.  Furthermore,  due  to  the  optimal  selection  of  the  pseudo-noise 
power  level  in  (3.22),  we  have 


£pn(x)<£free(x) 


(3.25) 


for  all  x- 

For  mA  #  0,  the  optimal  pseudo-noise  level  depends  both  on  a  a  and  on  m^.  For 

a  a  <C  mA,  the  random  variable  A  is  effectively  distributed  within  a  very  small  region  » 

around  m^.  In  that  case  the  worst-case  performance  analysis  of  Chapter  2  can  be  used  to 
accurately  predict  average  performance  (where  A  is  replaced  with  m^): 

<ropt(<7A,  mA)  =  argmin  B  (aA,  mA,  <rv)  “ 

<7t/ 

2 

«  argmin  B  (m^,  av)  «  —  |nu|, 

<TV  K 

where  B  (m^,  crv)  is  given  by  the  right  hand  side  of  (2.16)  for  A  =  m^  and  ca  —  av.  For  - 

'The  information  loss  in  the  encodings  is  finite  for  any  N  <  oo.  The  fact  that  £fr"  (a  a /crv)  diverges  for 
a  a  >  (Tv  simply  implies  that  the  information  loss  in  the  encodings  is  an  increasing  function  of  N  with  no 
upper  bound. 


74 


Figure  3-7:  Each  solid  curve  depicts  the  numerically  computed  value  of  cropt (<7^,111,4)  as  a 
function  of  a  a  for  a  given  m^.  The  dashed  curves  correspond  to  the  associated  predicted 
values  based  on  (3.26). 

arbitrary  m^  and  a  a,  the  optimal  pseudo- noise  level  is  accurately  approximated  by 


cropt{aA,  m^)  «  [(1.1395)r  arA  +  (2/jt )r  |nu|r] '  ,  (3.26) 


with  r  =  1.6.  Fig.  3-7  shows  the  accuracy  of  the  approximation  (3.26)  (dashed)  in  terms  of 
predicting  the  optimal  pseudo-noise  level  cropt(aA,  rru),  obtained  via  numerically  optimiza¬ 
tion  (solid). 

The  (MAP)  estimate  of  A  given  yN  can  be  readily  implemented  by  means  of  the  EM 
algorithm  described  in  App.  A.5;  use  Eqns.  (A.36),  (A.37)  in  conjunction  with  (A.28)  and 
(A.30).  In  this  case  these  equations  specialize  to  the  following  algorithm 


A{k+1)  -  m  a  +  - 
^EM  ~  m>l  + 


1  + 


Ncl 


^EM  — 


M 

£ 

771=1 


gq  jyN) 

y/2 nN 


C”X)( 

expl - 2 -cl - 


-exp  - 


( X 

VArn  /iEM  ) 


2*1 


q  (xn:^“)  -  q 


(3.27a) 
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and  where  Amap  (yA  )  is  given  by 


Amap  {yN)  =  Hjn  4m  •  (3.27b) 

In  general,  for  large  N  the  MAP  estimate  (3.27)  is  approximately  given  by  the  ML  estimate. 
Thus  asymptotically  it  achieves  the  average  information  loss  (3.13). 

Known  Control  Inputs 

A  priori  information  can  be  also  incorporated  in  the  system  (2.1)-(2.2)  in  case  the  control 
sequence  is  known  for  estimation  so  as  to  enhance  performance.  Specifically,  optimized 
encodings  together  with  the  corresponding  MAP  estimators  that  approach  the  associated 
bounds  (3.12)  can  be  constructed. 

For  brevity  we  again  focus  on  the  system  corresponding  to  M  =  2.  We  simply  need 
to  consider  the  case  where  the  Gaussian  random  variable  A  is  zero-mean;  given  that  the 
control  input  is  known  to  the  estimator,  the  encoding  strategies  we  develop  in  this  section 
for  the  zero-mean  case  can  be  readily  modified  to  accommodate  the  general  case. 

Unlike  the  range-limited  information-bearing  signals  considered  in  Chapter  2  and  Sec¬ 
tion  3.2.1,  periodic  control  inputs  are  inadequate  for  achieving  optimal  performance2.  For 
this  reason  we  consider  aperiodic  control  inputs,  and  in  particular  control  inputs  that  are 
sample  paths  of  an  IID  zero-mean  Gaussian  random  process  with  power  level  cr2 .  The 
objective  is  to  determine  the  power  level  of  the  control  input  signal  that  optimizes  the 
encoding  performance  in  terms  of  minimizing  (3.12)  or  (3.13). 

To  distinguish  it  from  B  {a  a,  v-w)  (the  average  encoding  performance  in  the  case  that 

the  estimator  only  exploits  statistical  characterization  of  w[n])  we  will  use  B  (a a,  crv]  aw) 
to  denote  the  average  encoding  performance  for  a  given  set  of  a  a,  <?v,  and  <jw,  where  w[n]  is 
an  IID  Gaussian  process  of  power  level  <7^,  and  where  w[n\  is  known  for  estimation.  For  any 
given  value  of  the  random  variable  A,  the  Cramer-Rao  bound  on  all  unbiased  estimates  of 
A  from  y[n],  where  tn[n]  is  known  to  the  estimator  and  is  a  sample  path  of  an  IID  Gaussian 

2Although  as  N  — ¥  oo  we  can  not  rely  on  periodic  inputs  to  achieve  optimal  performance,  for  any  finite 
N,  no  matter  how  large,  we  can  develop  periodic  inputs  that  are  approximately  optimal 


process,  satisfies 


B(A;  y,pw{-))  =  {B(A-w;y)}  1pVJ(w)dw 


-l 


olVw 


/ 


f 


(;) 


o  (^f) « (-^f) 

[j  Q(u)Q{-u)f\  cw  ) 


=  cr. 

*  "  l 

~  crv(Twf  2- 

where  approximation  (3.28d)  is  valid  for  av  <C  <xw,  and  where 

f(n) 


=  700 _ j 

L/oo  Q  (« 


)  0  (-«) 


dtt 


-1 


0.5536  . 


(3.28a) 

(3.28b) 

(3.28c) 

(3.28d) 


The  average  encoding  performance  B  (a a,  av\  ow)  is  then  given  by  substituting  (3.28d)  in 
(3.12) 

(3.29a) 
(3.29b) 

which  is  finite  if  and  only  if  aw  >  a  a-  In  particular,  the  value  of  aw  that  minimizes 
B{aA ,  cr„;  <rw)  for  aA  >  <rv  large  SNR  x)  is  aw  =  y/2oA,  in  which  case  (3.29b) 

reduces  to 


B  {a a,  Kpt)  ~  >/8lrI  av  aA  ■ 


Specifically,  if  we  select 


<ptM  =  { 


\  \l2  a\  ~av 


if  a  >  y/2  av 
otherwise 


(3.30) 
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Figure  3-8:  Average  information  loss  as  a  function  of  signal-to-noise  ratio  x  for  no  control 
inputs  (upper  solid)  and  for  optimally  designed  pseudo-noise  (middle  solid)  and  known 
(lower  solid)  inputs  in  the  case  M  —  2.  Both  the  IID  sensor  noise  and  the  a  priori  PDF  are 
zero-mean  Gaussian.  The  control  input  is  a  typical  sample  path  of  an  IID  Gaussian  process 
of  power  level,  selected  according  to  (3.22)  and  (3.30),  respectively.  The  successively  lower 
dashed  lines  show  the  high-SNR  performance,  as  predicted  by  (3.19)  and  (3.31),  respectively. 
The  dotted  line  depicts  the  2  dB  lower  bound. 


and  by  using  the  fact  that  B  (A;  s )  =  <x^,  we  get 


£kn (x)  «  VSxXx 


(3.31) 


for  high  SNR,  i.e.,  by  proper  choice  of  the  energy  level  of  the  Gaussian  control  input  we 
can  make  the  average  information  loss  to  grow  as  slow  as  linearly  with  SNR. 

The  information  loss  of  this  scheme  is  depicted  in  Fig.  3-8  as  a  function  of  SNR  x-  The 
figure  also  depicts  the  high-SNR  average  performance  for  optimized  pseudo-noise  (upper 
dashed)  and  known  (lower  dashed)  control  inputs  and  predicted  by  (3.19)  and  the  approxi¬ 
mation  (3.31),  respectively.  Although  the  encoding  scheme  and  the  criterion  used  to  assess 
the  encoding  quality  as  well  as  the  a  priori  assumptions  about  the  information-bearing 
signal  differ  substantially  from  the  unknown  parameter  case  considered  in  Chapter  2,  the 
resulting  performance  is  strikingly  similar. 

The  MAP  estimator  for  any  known  control  input  sequence  is  very  similar  to  (3.27)  which 
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was  used  in  the  pseudo-noise  case.  Specifically,  it  can  be  readily  implemented  by  means  of 
the  following  EM  algorithm  which  is  a  special  case  of  the  algorithm  derived  in  App.  A.5: 


*^EM 


=  mA  + 


^EM  —  + 


(3.32a) 


and  where 


r(k)  _  -  wopt  _ 

'm  _  ? 


(3.32b) 


the  MAP  estimate  Amap  (yN)  is  then  given  by 


Amap  (yjV)  =  lim  AgJj  . 


(3.32c) 


Feedback  in  Control  Input  Selection 

A  priori  information  can  also  be  incorporated  in  the  system  (2.1)-(2.2)  employing  feedback 
in  the  selection  of  the  control  sequence.  Specifically,  average  encoding  performance  bounds 
and  corresponding  MAP  estimators  that  asymptotically  approach  these  bounds  can  be 
constructed. 

Again  we  focus  on  the  system  corresponding  to  M  =  2.  We  can  design  MAP  estimators 
for  the  feedback  case;  these  asymptotically  (large  N)  attain  the  performance  of  the  ML 
estimators  based  on  feedback  (developed  in  Chapter  2)  and  thus  for  any  A  asymptotically 
achieve  the  2  dB  bound.  Consequently,  in  this  case  the  average  performance  is  the  same. 

In  general,  for  finite  N,  the  performance  bounds  of  these  MAP  solutions  will  be  depen¬ 
dent  on  N.  In  determining  the  lowest  possible  achievable  Cramer-Rao  bound  for  estimating 
A  ~  Af  (m,4,  cr\)  based  on  observation  of  yN  from  (2.1)-(2.2),  we  allow  the  selected  control 
vector  wN  to  depend  on  the  particular  value  of  A.  Specifically,  let  B  (yN;  wN)  denote 
the  Cramer-Rao  bound  for  estimating  A  resulting  from  a  particular  selection  method  for 
the  control  input  wN  based  on  observation  of  yN.  We  may  use  the  Cramer-Rao  bound  on 
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unbiased  estimates  in  the  case  that  A  is  a  random  variable  which  in  the  case  the  input  is 
known  satisfies 


B  (yN;  wN)  =  [{5  (A;  yN,  w  N)j  *]  +  (3.33a) 

=  (X>[{SM-w[n];  »)}"'] (3-33b) 

—  YN  l~  ’  (3.33c) 

+  °\ 

where  B  (A;  yN,  vrN)  and  B  (A  —  w[n\\  y )  are  given  by  (2.23)  and  (2.16),  respectively. 
Ineq.  (3.33c)  provides  a  bound  on  the  performance  of  any  unbiased  estimator  of  A  from 
yN,  and  for  any  selection  of  the  control  sequence  wN.  Note  that  (3.33c)  results  from 
application  of  (2.37),  with  equality  achieved  for  w[n]  =  —A.  Since  such  a  control  sequence 
is  not  plausible  (due  to  its  dependence  on  the  unknown  parameter  A),  in  a  manner  analogous 
to  (2.52)  we  may  select  the  control  sequence  as  follows 


w[n]  =  -Amap  (yn  *)  • 


(3.34) 


The  corresponding  MAP  estimator  can  be  obtained  from  the  ML  estimation  algorithm 
(2.53)  with  minor  modifications,  and  can  be  derived  as  a  special  case  of  the  algorithm 
described  in  App.  A.5: 


^EM  ^  W  —  m>l  + 


E 


CTy  y[™\ 

y/^Ttn 


exp 

f  (^EM  W -^MAPH)2^ 

V  .  2"-  , 

h 

q( 

4mW-^mapW 

tv  y[m\  j 

(3.35a) 


AmapM  =  Amap  (yn)  =  lim  AemM  .  (3.35b) 

K-+OQ 

Empirical  evidence  suggests  that  the  MAP  estimate  (3.35)  in  conjunction  with  selecting 
u?[n]  according  to  (3.34)  achieves  the  minimum  possible  information  loss  (3.33c)  for  mod- 
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Figure  3-9:  Performance  based  on  Monte-Carlo  simulations  (solid  curve)  of  the  MAP  esti¬ 
mator  of  the  random  parameter  A  based  on  observations  from  a  binary  quantizer  where  the 
control  input  at  time  n  equals  the  negative  of  the  estimate  at  time  n  —  1.  The  dotted  curves 
correspond  to  the  Cramer-Rao  bounds  for  estimating  A  based  on  the  infinite-resolution 
sequence  (dotted  curve)  and  the  quantized-sequence  based  on  the  best  possible  control 
sequence  selection. 

erate  N  values,  similarly  to  its  ML  counterpart.  Note  that  in  the  presence  of  a  priori 
information,  and  for  a  a  <C  av ,  the  control  sequence  w[n]  enables  immediate  operation 
around  the  quantizer  threshold,  and  thus  quicker  convergence  to  the  corresponding  mini¬ 
mum  possible  information  loss  (3.33c).  However,  for  large  enough  N,  where  the  information 
from  the  available  observations  dominates  the  a  priori  information  we  may  also  substitute 
the  MAP  algorithm  (3.35)  with  the  low-complexity  estimator  (2.58)  without  compromising 
performance. 

3.3  Unknown  Noise  Power  Level 

Another  important  extension  of  the  estimation  problem  considered  in  Chapter  2  involves 
estimation  of  the  unknown  parameter  of  interest  when  in  addition  the  noise  power  level  is 
unknown.  Specifically,  consider  the  problem  of  estimating  the  unknown  parameter  A  and 
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possibly  the  unknown  noise  power  level  av  from  observation  of 


y[n]  =  F(A  +  av  n[n]  +  u?[n]) 


(3.36) 


where  v[n]  is  an  IID  process  of  known  statistical  characterization,  w[n]  is  a  control  input, 
and  F(-)  is  an  M-level  quantizer  given  by  (2.2). 


3.3.1  Performance  Limits 


In  order  to  assess  the  performance  of  the  encoding  strategy  we  develop,  we  rely  on  extensions 
of  the  figures  of  merit  developed  in  Chapter  2.  Specifically,  let  0  —  [A  <rv]T  denote  the  vector 
of  unknown  parameters,  and  let  also  for  convenience  6i  =  A  and  62  —  av.  Let  B  (9;  yN) 
denote  the  2x2  Cramer-Rao  bound  matrix  for  unbiased  estimates  of  the  vector  parameter 
9  from  observation  of  yN.  Then 


(i(yiv)-7l)2j  >  [B(6;  y")]u 


and 


41* 


E  [(^  (/0  "  °v)2]  >  [B  {9;  yiV)]2  2 


where  A  (yN)  and  dv  (y^)  are  any  unbiased  estimators  of  A  and  cr„,  respectively. 

Analogously  to  the  known  av  case,  we  use  as  our  measure  of  quality  of  the  encoding 
strategy  the  following  notion  of  information  loss 


C(A,  av)  = 


[s(«;  y")],., 

[B(fl;  s")!u  ' 


(3.37) 


We  assume  that  the  range  of  the  parameter  of  interest  A  is  (—A,  A),  while  the  unknown 
noise  level  satisfies  <7mjn  <  crv.  Worst-case  performance  is  used  as  a  measure  of  the  encoding 
quality,  i.e., 


•£max(A,  <Tmin)  —  max  F(A,  (Tv).  (3.38) 

j4e(— A,  A) 

We  focus  our  attention  on  the  case  where  u[n]  is  a  zero-mean  Gaussian  noise  process  of  unit 
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variance;  similar  results  can  be  developed  for  nonGaussian  admissible  sensor  noises. 


Pseudo-noise  Control  Inputs 

In  this  section  we  assume  that  the  estimator  can  only  exploit  knowledge  of  the  statistical 
characterization  of  the  control  input  w[ri\  for  estimation.  In  particular,  we  assume  that 
the  control  input  can  be  modeled  as  an  IID  zero-mean  Gaussian  sequence  of  power  cr2,.  As 
usual,  absence  of  a  control  input  corresponds  to  the  special  case  <rw  =  0.  Consider  the  2x2 
Fisher  Information  matrix  associated  with  the  Cramer-Rao  bound  matrix  B  ( 0 ;  yN),  i.e., 

yN)  =  [B{9;  yN)]_1  •  (3.39) 


The  (i,  j)th  entry  of  the  Fisher  Information  matrix  can  be  obtained  by  partial  differentiation 
of  the  log-likelihood  with  respect  to  0,-  and  0j,  followed  by  an  expectation,  and  can  be  put 
in  the  following  form 


[r(*  y  )],-,,•=  \ 


if  i  =  j  =  1  (i.e.,  0,-  =  6j  =  A) 


.  M  o 

_L  V  2™. 

fcm 
I  M  S2 

T7  ~  if  i  =  j  =  2  (i.e.,  0,-  =  0j  =  crv)  , 


(3.40a) 


TO=  1 

M 


1 


7m 


if  i  ±  j 


TO=1 


where 


7m  = 


_  J_  L  _  /Am- A\ 

^[  \  ^cr  )  J  \  <7a  J 


(3.40b) 


(3.40c) 


(3.40d) 


f(x)  =  exp(-x2/2)/\/2¥,  and  aQ  =  i/a2  +  ct2,. 

In  the  special  case  M  =  2  the  determinant  of  (0;  yw)  equals  zero,  revealing  that 
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estimation  of  0  for  M  =  2  is  an  ill-posed  problem.  In  the  absence  of  pseudo-noise  (aw  =  0) 
this  is  easily  clarified  by  noting  that  a  parameter  6  =  [A  crv]T  yields  the  same  observation 
sequence  as  the  parameter  XO  =  [A  A  Xav]T ;  given  any  sequence  u[n],  and  by  denoting  the 
observed  sequence  as  y[n\  A]  to  denote  its  A-dependence,  we  have 

y[n;  A]  =  sgn  (A  A  +  A  av  u[ra])  =  sgn  ( A  +  av  u[raj)  =  t/[n;  1] . 

Similarly,  in  the  pseudo-noise  case,  for  any  A  >  1,  any  pair  (A,  av)  is  equivalent  to  a  pair 
(A  A,  \'crv),  where 

A^i-VA  *{0l  +  al)-ol 
ov 

since 

y[n\  A]  =  sgn  (A  A  +  yj X'2  a2v  +  a2w  a[n])  =  sgn  (A  A  +  A  cra  d[n])  =  y[n;  1] . 

For  this  reason,  for  pseudo-noise  control  inputs  we  focus  on  the  case  M  —  3  to  illustrate 
the  encoder  design.  In  particular,  we  assume  that  F(-)  is  a  symmetric  quantizer,  i.e., 
Xi  =  —X2  =  X.  Given  A  and  <rm;n,  we  wish  to  select  the  noise  power  level  aw  so  as  to 
minimize  the  worst-case  performance  as  depicted  by  (3.38) . 

The  worst-case  performance  (3.38)  in  this  case  occurs  at  the  parameter  space  boundary 
where  ov  crm;n  and  |A|  A.  In  particular,  analogous  to  the  case  that  the  sensor  noise 
power  level  is  known  we  may  define  a  measure  of  peak  signal-to-noise  ratio,  as  follows 

A 

X=  - — 

C'min 

via  which  we  can  characterize  the  encoding  performance.  In  Fig.  3-10  we  show  the  optimal 
choice  in  terms  of  the  pseudo-noise  level  in  the  sense  of  minimizing  the  worst-case  informa¬ 
tion  loss  (3.38)  as  a  function  SNR  for  A  =  1  and  X  =  0.5.  As  in  the  case  corresponding  to 
a  known  sensor  noise  level  examined  in  Chapter  2,  it  is  evident  that  at  high  SNR  A/am;n, 
the  optimal  pseudo-noise  level  is  independent  of  the  sensor  noise  level  <7m;n. 

The  solid  line  in  Fig.  3-11  depicts  the  associated  worst-case  information  loss  as  a  function 
of  SNR.  In  the  same  figure  the  dotted  curve  depicts  the  uncoded  performance,  corresponding 
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0.8 


Figure  3-10:  Optimal  Pseudo-noise  level  as  a  function  of  SNR  for  a  three  level  quantizer 
with  X  —  0.5,  and  A  =  1. 

to  w[n ]  =  0.  For  comparison  we  show  the  associated  performance  curves  when  there  is  no 
control  input  (dash-dot)  and  for  pseudo-noise  encoders  (dashed)  in  the  case  that  the  sensor 
noise  level  is  known  (in  which  case  the  peak  SNR  equals  A/av). 

Fig.  3-12  shows  the  additional  information  loss  arising  from  lack  of  knowledge  of  the 
sensor  noise  level.  As  we  can  see,  lack  of  knowledge  of  the  sensor  noise  level  comes  at  an 
additional  cost  of  less  than  8  dB  encoding  loss  for  any  signal-to-noise  ratio  x- 


Known  Control  Inputs 

We  can  also  consider  encoding  strategies  for  the  case  that  the  estimator  can  fully  exploit 
knowledge  of  the  control  input  sequence  used  at  the  encoder.  Naturally,  we  wish  to  construct 
the  control  input  so  as  to  minimize  the  worst-case  information  loss  (3.38).  In  a  fashion 
similar  to  the  case  where  the  noise  level  is  known  we  can  show  that  by  using  periodic 
control  inputs  of  the  form  (2.26)  where  K  is  selected  from  (2.27)  and  where  x  is  replaced 
with  A/erm;n  we  can  provide  encoding  strategies  for  which  the  associated  information  loss 
grows  linearly  with  A/o-min. 

The  (i,  j) th  entry  of  the  Fisher  Information  matrix  can  be  obtained  by  partial  differen¬ 
tiation  of  the  log-likelihood  with  respect  to  and  0j,  followed  by  an  expectation: 


K  A  A  lj[k] 

N  h  h 

\t{q.  -i  iV'Y' 

1  (  ’  y  ^  1  N  em[k] 

M  K  7 m[k]5m[k] 


m=l  k=l 


if  i  =  j  =  1  ( i.e .,  &i  =  0j  =  A) 
if  i  =  j  =  2  (i.e.,  6i  =  6j  =  <rv) 
if  i  /  j 


(3.41a) 
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Figure  3-11:  Information  loss  as  a  function  of  SNR  in  the  absence  of  a  control  input  (dotted) 
and  in  the  presence  of  optimally  selected  pseudo-noise  level  (solid).  For  comparison,  the 
associated  performance  curves  for  known  sensor  noise  level  are  shown. 

where 


Xm~  A-  W[fc] 


)] 


(3.41b) 


*m[*l  =  % 


Xm-\  -A-  u?[&]  .  f  Xm-1  -  A-  w[fc]\  Xm  —  A-  «;[&]  /  Xm-  A-  w[k\ 


f 


^Xm-i  -  A-  _ 


/ 


(Xm-A- 

\ 


(3.41c) 


Xm-!  -  A  -  w[k] 


)A 


Xm-  A-  w[k] 


)• 


(3.41d) 


and  f(x )  =  exp(—x2/2)/\/27r. 

In  Fig.  (3-13)  we  show  the  performance  (solid)  in  terms  of  the  worst-case  information 
loss  as  a  function  of  SNR.  We  also  show  the  associated  performance  in  the  case  that  the 
power  level  is  known  for  estimation  (dashed).  As  the  figure  illustrates,  lack  of  knowledge 
of  the  noise  power  level  comes  at  a  cost  that  is  upper-bounded  by  about  3  dB  at  low  SNR, 
while  at  high  SNR  the  additional  loss  is  negligible. 


86 


Figure  3-12:  Additional  worst-case  information  loss  arising  from  lack  of  knowledge  of  the 
sensor  noise  level  av. 


Figure  3-13:  Worst-case  information  loss  for  known  control  input,  in  the  case  the  sensor 
noise  level  is  known  (dashed)  and  unknown  (solid). 

Control  Inputs  in  the  Presence  of  Feedback 

As  in  the  known  sensor  noise  level  case,  exploiting  feedback  in  the  design  of  the  quantized 
encodings  can  yield  substantial  benefits  in  terms  of  the  associated  information  and  MSE 
loss.  Although  feedback  can  also  be  exploited  in  the  case  M  —  2,  for  purposes  of  illustration 
we  restrict  our  attention  to  the  case  M  =  3  involving  a  symmetric  quantizer.  In  that  case, 
for  any  <jv  we  have 


[B([A  <rv]T-,  y*)].ti  <  [B{{ 0  <7„]r;  /)].,. 

for  i  =  1,  2,  which  reveals  that 

[B  ([A  av]T;  yN ,  /)].,.  >  [B  ([0  av]T;  yN)]lt/N, 

and  where  equality  is  achieved  if  w[n]  =  —A  for  n  =  1,  2,  •  ••  ,  N.  In  the  presence  of 
feedback  the  performance  corresponding  to  w[n]  =  —A  can  be  practically  achieved  by  using 
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encodings  of  the  form: 


w[n]  =  —  A[n  —  1] 


(3.42) 


where  A[n  -  1]  is  a  consistent  estimate  of  A. 


3.3.2  Estimation  Algorithms 

In  App.  B  we  present  an  EM  algorithm  which,  under  the  condition  that  the  likelihood 
function  has  a  single  local  maximum  over  the  parameter  range  of  interest,  results  in  the  ML 
estimate  of  the  unknown  parameter  vector  6  =  [A  av].  Depending  on  the  particular  case, 
this  EM  algorithm  specializes  to  a  number  of  different  forms. 

For  pseudo-noise  control  inputs,  the  ML  estimates  Aml[A]  and  ctml[A],  °f  A  and  av, 
respectively  are  given  by 


Aml[A] 

£ml[A] 


lim 

k-¥  oo 


lim 

fc-J-OO 


(3.43a) 

(3.43b) 


where  and  ^  are  given  by  (B.5a)  and  (B.5b)  with  I  —  N,  and  where 


£  =  \/  <7  min  + 
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(3.43c) 

(3.43d) 

(3.43e) 

(3.43f) 

(3.43g) 


-'EM 


In  Fig.  3-14  we  present  the  MSE  performance  of  this  EM  algorithm  for  X  =  0.5,  N  =  1000, 
A  =  0.1,  A  =  1,  <7min  =  0.05,  for  several  values  of  the  pseudo-noise  power  level  av  in  the 
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Figure  3-14:  MSE  loss  in  the  parameter  A  from  quantized  encodings  with  pseudo-noise 
control  inputs  as  a  function  of  sensor  noise  level  for  aw  =  0.25. 

two  cases  that  <7^  =  0  and  aw  =  0.25.  As  we  can  see,  in  both  cases  the  information  loss 
metric  (3.37)  accurately  predicts  the  MSE  loss  performance  of  the  EM  algorithm. 

Similarly,  in  Fig.  3-15  we  depict  the  MSE  in  A  and  av  of  the  EM  algorithm  of  App.  B, 
when  feedback  is  available  and  is  exploited  in  the  form  of  (3.42),  for  a  symmetric  quantizer 
with  M  =  3  and  X  =  0.1,  in  the  case  that  0  —  [0.5  0.1]T.  As  the  figure  reveals,  feedback 
in  conjunction  with  the  EM  algorithm  of  App.  B  achieves  the  optimal  performance  within 
a  few  iterations. 
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Figure  3-15:  MSE  performance  of  the  EM  algorithm  of  App.  B  for  estimating  the  parameters 
A  (upper  figure)  and  av  (lower  figure)  from  quantized  encodings  in  the  presence  of  feedback 
exploited  via  (3.42).  The  dashed  lines  correspond  to  the  performance  predicted  by  the 
Cramer-Rao  bounds  at  6 *  =  (A,,  av).  The  dotted  lines  correspond  to  the  Cramer- Rao 
bounds  for  estimation  of  the  parameter  0  based  on  original  observations  sN. 
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Chapter  4 


Optimized  Encoding  Strategies  for 
the  Static  Case 


Encoders  of  the  form  of  quantizer  bias  control  are  very  attractive  for  digitally  encoding 
noisy  measurements  since  they  can  be  designed  to  provide  nice  tradeoffs  between  encoder 
complexity  and  performance.  However,  although  these  encoders  can  achieve  performance 
that  does  not  degrade  with  SNR  by  exploiting  feedback,  these  systems  are  inherently  limited 
in  the  sense  that,  in  general,  they  incur  a  small  information  loss. 

In  this  chapter  we  examine  the  problem  of  eliminating  performance  losses  by  allowing 
more  freedom  in  the  encoder  design.  This  problem  may  arise,  for  instance,  in  the  context  of 
distributed  networks  of  wireless  sensors,  where  bandwidth  constraints  limit  the  effective  data 
rate  (or  equivalently  bits  per  measurement)  at  which  each  sensor  can  reliably  communicate 
to  the  host,  but  not  the  processing  complexity  at  the  sensor.  The  resulting  problem  of 
joint  design  of  signal  encoding  and  estimation  can  be  viewed  as  a  generalization  of  the 
low-complexity  quantizer  bias  control  systems  developed  in  Chapter  2,  where  the  encoder 
has  the  resources  to  perform  more  elaborate  processing. 

As  in  Chapters  2  and  3,  we  focus  on  the  static  case;  we  wish  to  determine  the  performance 
limits  in  terms  of  estimating  a  range-limited  parameter  based  on  digitally  encoded  noisy 
measurements.  A  block  diagram  description  of  the  general  problem  is  depicted  in  Fig.  4-1. 
A  sequence  of  noise-corrupted  observations  s[n]  of  an  unknown  parameter  A  €  (—A,  A) 
is  encoded  causally  into  a  sequence  of  symbols  M- ary  symbols  y[n\.  The  objective  at  the 
receiver  is  to  estimate  A  based  on  the  encodings  y[  1],  y[ 2],  •  •  •  ,  y[nj. 


in  (-A,  A) 


A[n] 

Parameter 

estimate 


v[n] 

Sensor  noise 


Figure  4-1:  Block  diagram  of  systems  performing  encoding  and  signal  estimation. 


We  assume  that  system  constraints  actually  limit  the  average  encoding  rate  to  at  most 
one  encoded  M- ary  symbol  per  sensor  measurement  (as  shown  in  Fig.  4-1,  this  rate  limi¬ 
tation  is  enforced  by  constraining  the  encoder  to  be  causal).  In  the  process  we  consider  a 
variety  of  encoding  schemes.  These  range  from  batch-mode  encoding  strategies,  where  the 
encoder  first  observes  all  n  noisy  measurements  and  then  provides  an  n-symbol  encoding 
from  these  measurements  that  can  be  used  to  form  a  single  estimate  at  time  n,  to  embedded 
fixed-rate  encoders  which  encode  one  M- ary  symbol  per  each  available  sensor  measurement. 


In  Section  4.1  we  introduce  the  figures  of  merit  that  we  use  to  characterize  the  perfor¬ 
mance  of  the  various  encoding  and  estimator  systems  that  we  develop  in  this  chapter.  In 
Section  4.2  we  develop  variable-rate  encoding  methods  and  associated  estimators  which  are 
asymptotically  optimal  in  the  sense  that  they  asymptotically  achieve  the  performance  of 
any  consistent  estimator  from  the  original  sensor  measurements  that  can  be  computed  at 
the  encoder.  Then  in  Section  4.3  we  consider  fixed-rate  encoding  methods  which  encode 
at  the  sensor  one  symbol  for  every  new  available  observation.  We  construct  a  class  of  such 
encoding  methods  which  are  also  asymptotically  optimal.  We  also  illustrate  the  robustness 
of  these  encoding  strategies  by  examining  their  performance  in  the  presence  of  a  nonadmis- 
sible  noise.  Finally,  in  Section  4.4  we  present  multi-sensor  extensions  of  the  single-sensor 
systems  that  are  developed  in  this  chapter. 


4.1  Performance  Characterization 

For  convenience,  throughout  this  chapter  we  use  the  notation  A[n]  to  denote  an  estima¬ 
tor  of  the  parameter  A  that  is  formed  at  the  sensor  from  the  original  noisy  observations 
s[l],  s[2],  •  •  •  ,  s[n]  given  by  (2.3),  and  the  notation  A[n]  to  denote  an  estimator  of  A  that  is 
formed  at  the  host  from  all  digitally  encoded  observations  collected  up  to  and  including  time 
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n.  Throughout,  we  refer  to  A[n\  and  A[n],  as  the  sensor  and  the  host  estimate,  respectively. 
Consistent  with  the  system  constraints,  we  design  encoders  for  which  at  any  time  instant 
n  the  average  encoding  rate  is  less  than  or  equal  to  one  M-symbol  per  sensor  measurement 
(i.e.,  the  number  of  encoded  M- ary  symbols  never  exceeds  the  number  of  available  noisy 
measurements  at  the  encoder). 

In  designing  the  signal  encoder  and  estimator  pair  we  use  figures  of  merit  based  on  the 
asymptotic  performance  of  these  systems.  The  performance  metrics  we  employ  are  analo¬ 
gous  to  the  asymptotic  MSE  loss  criterion  we  have  introduced  in  Chapter  2.  Specifically,  a 
naturally  suited  measure  of  performance  is  the  asymptotic  MSE  loss,  defined  as 

A  E  f(A[n]  -  A)2] 

£mse(A)  =  lim  — _ — — - .  (4.1) 

n— Kso  B  (A;  s") 


In  all  the  encoding  strategies  we  develop  in  this  chapter,  the  encoder  operates  on  a  particular 
consistent  sensor  estimator  A[n]  formed  from  the  original  data  sn.  A  suitable  measure 
of  encoding  performance  for  these  strategies  is  based  on  comparing  the  MSE  of  the  host 
estimate  A[n]  that  is  formed  based  on  the  encoding  yn  against  that  of  the  associated  estimate 
A[n]  computed  at  the  sensor  from  the  original  data.  For  that  reason,  we  use  the  notion  of 
asymptotic  processing  loss  of  an  encoder  with  respect  to  a  particular  sensor  estimator  A[n] 
from  sn,  defined  via 


£proc(A)  =  lim 

71— »CO 


E[(A[n]-A)2\ 
E  (A[n]-A)2 


(4.2) 


We  refer  to  an  encoding  that  achieves 


■£Proc(A)  —  1  , 


(4.3) 


or 


£mse(A)  =  1  ,  (4.4) 

over  all  |A|  <  A  as  asymptotically  optimal,  or  asymptotically  efficient,  respectively.  In 
the  case  that  the  sensor  estimate  A{n ]  (formed  via  sn)  is  asymptotically  efficient  with 
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respect  to  B  (A;  s"),  the  metrics  £mse(A)  and  £proc(^)  are  identical  and  can  be  thus  used 
interchangeably. 

When  designing  an  algorithm  for  encoding  an  asymptotically  efficient  sensor  estimate 
A[n]  formed  from  sn,  it  is  important  to  minimize  the  mean-square  difference  between  the 
sensor  estimate  A[n]  and  the  associated  host  estimate  A[n],  which  we  throughout  refer  to  as 
the  residual  error  between  these  two  estimates.  In  particular,  since  the  MSE  of  the  sensor 
estimate  A[n]  decays  as  the  inverse  of  the  number  of  observations  (for  admissible  noises), 
whenever  we  can  design  encoding  schemes  for  which  the  residual  error  decays  faster  than  the 
inverse  of  the  number  of  observations,  the  resulting  host  estimate  would  be  asymptotically 
optimal  in  the  sense  that  it  would  satisfy  (4.3).  Specifically,  if  the  residual  error  decays 
faster  than  1  /n,  i.e.,  if 


lim 

n— too 


nE 


(i[n]  -  A[nj) 


=  0 


(4.5) 


then,  by  using  the  triangle  inequality 


E  [(.4[n]  -  ,1)21  <  E  +  E  (A[n] 


a)* 


and  the  definition  (4.2)  we  obtain 


jCproc(A)  <  1  + 


E  (i[n]-i[n])2 
1  +  lim  — K - r=r*-  <  1  . 


E  (i[u] -A)' 


(4.6) 


We  also  note  that  1  <  Cproc(A)  due  to  the  data  processing  inequality  and  the  asymptotic 
efficiency  of  the  sensor  estimate,  which,  in  conjunction  with  (4.6),  proves  the  asymptotic 
optimality  of  the  corresponding  encoder  in  the  sense  of  (4.3)  and  (4.4). 


4.2  Variable-Rate  Signal  Encoders 

In  this  section  we  consider  algorithms  that  generate  variable-rate  encodings.  Given  any 
consistent  sensor  estimator  A[n]  formed  from  sn,  the  objective  is  to  construct  a  digital 
encoding  and  a  host  estimator  A[n]  from  this  encoding  which  (as  a  pair)  asymptotically 
achieve  the  MSE  performance  of  the  original  sensor  estimator  A[n].  For  reference,  we  first 
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consider  asymptotically  optimal  batch-type  algorithms,  which  collect  all  the  available  data 
prior  to  generating  a  digital  encoding  from  which  a  host  estimator  A[n\  can  be  formed. 
Thereafter,  we  present  a  class  of  variable-rate  algorithms  which  are  extensions  of  the  batch¬ 
mode  algorithms  and  also  result  in  asymptotically  achieving  the  MSE  rate  of  the  original 
sensor  estimator. 


4.2.1  Batch- Type  Encoders 

The  objective  in  batch-type  encoding  is  to  generate  an  encoded  description  that  is  to  be 
used  once  at  a  particular  instant  n;  a  batch  encoder  first  collects  all  the  observations 
s[l],  s[2],  •  •  •  ,  s[n]  before  forming  the  digital  encoding  y[l],  y[ 2],  •  •  •  ,  y[n]. 

As  suggested  in  the  introduction  of  Chapter  2,  it  is  straightforward  to  devise  batch- 
type  encoding  algorithms  and  associated  host  estimators  from  these  encodings  that  are 
asymptotically  optimal,  in  the  sense  that  they  achieve  (4.4).  For  convenience  and  without 
loss  of  generality,  we  consider  the  case  where  the  encoder  can  construct  an  efficient  sensor 
estimate  A[n]  from  the  noisy  data  sn,  i.e., 


E 


(a  -  am) 


g(A;  s) 
n 


(4.7) 


In  that  case,  the  encoder  first  collects  s[l],  s[2],  •  •  •  ,  s[n],  subsequently  computes  A[n],  and 
finally  encodes  as  y[l],  y[2],  •  •  •  ,  y[n]  the  n  most  significant  M- ary  symbols  in  the  base-M 
representation  of  A[n\.  If  the  host  estimate  A[n]  of  A  based  on  the  encoding  yn  is  formed 
as  the  real  number  whose  base-M  representation  consists  of  the  same  n  most  significant 
M- ary  symbols  as  the  sensor  estimate  A[n]  followed  by  some  sequence  of  M- ary  symbols, 
the  residual  error  between  AM  and  A[n]  decays  to  zero  exponentially  with  n,  i.e., 


E 


(am  -  am) 


<  DM~2n , 


(4.8) 


and,  hence,  it  satisfies  (4.5).  The  constant  D  in  (4.8)  depends  on  the  parameter  range  A 
and,  in  particular,  satisfies  D  >  A2.  By  using  (4.7)-(4.8)  and  the  triangle  inequality,  we 
obtain 


lim  nE  (AM-  n)2 

n—¥oo  Lv  ' 


—  B  (A;  s) 


5 
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i.e.,  the  host  estimate  A[n]  is  asymptotically  efficient  with  respect  to  B  ( A ;  s).  In  fact,  A[n\ 
effectively  achieves  the  bound  B  (A;  s)  jn  rapidly  with  increasing  n,  since  the  residual  error 
(4.8)  between  A[n ]  and  A[n\  decays  exponentially  with  n,  while  the  MSE  of  the  original 
sensor  estimate  A[n]  in  (4.7)  decays  only  as  1/n. 

Similarly,  if  the  estimate  A[n]  formed  at  the  encoder  via  sn  is  asymptotically  efficient 
with  respect  to  B  (A:  sn),  the  described  batch  encoding  method  produces  an  encoding 
from  which  the  corresponding  A[rc]  is  also  asymptotically  efficient.  In  general,  if  A[n]  is 
any  consistent  estimator,  the  resulting  A[n ]  based  on  batch  encoding  is  asymptotically 
optimal,  in  the  sense  that  it  satisfies  (4.3).  As  we  have  discussed  at  the  outset  in  Chapter  2, 
although  this  simple  batch  encoding  method  asymptotically  achieves  the  MSE  performance 
of  the  original  sensor  estimate  A[n]  (and  the  Cramer-Rao  bound  B  (A;  s)  jn  in  case  A[n] 
is  asymptotically  efficient),  it  has  the  major  disadvantage  that  no  encoded  bits  can  be 
generated  until  all  the  observations  are  available.  Moreover,  it  is  not  refinable,  since  no 
method  is  suggested  for  encoding  any  additional  M- ary  symbols  as  new  sensor  measurements 
are  collected. 

4.2.2  Refinable  Variable-Rate  Encoding  Algorithms 

As  we  have  seen,  batch  encoding  algorithms  produce  asymptotically  optimal  host  estimates 
since  the  residual  error  between  the  sensor  estimate  and  the  host  estimate  decays  at  a  faster 
rate  than  the  mean-square  error  in  the  sensor  estimate.  In  fact,  the  residual  error  between 
the  two  estimates  decays  exponentially  fast  with  the  number  of  observations,  as  opposed 
to  the  MSE  in  the  encoder  estimate  which  decays  only  as  the  inverse  of  the  number  of 
observations. 

We  can  exploit  this  exponentially  fast  rate  in  the  improvement  of  the  host  estimate 
quality  to  construct  refinable  variable-rate  signal  encoding  strategies  that  are  asymptotically 
optimal.  In  particular,  by  using  repeatedly  the  batch-type  encoding  algorithm  at  a  sequence 
of  appropriately  spaced  time  instants  Nk,  we  can  construct  variable-rate  encoding  strategies 
which  achieve  (4.3)  and  for  which  the  average  encoding  rate  never  exceeds  one  M- ary  symbol 
per  observation.  Specifically,  at  each  n  =  Nk  for  k  =  1,  2,  •  •  *  we  may  use  the  batch- type 
algorithm  to  encode  the  sensor  estimate  A[Nk]  obtained  from  sNk  into  the  (Nk  —  Nk-i) 
M- ary  symbols  y[Nk-i  +  1],  •  •  •  ,  y[Nk  -  1],  y[Nk],  based  on  which  the  host  estimate  A[Nk] 
is  to  be  formed.  Since  no  encoded  symbols  are  supplied  by  the  encoder  to  the  host  between 


time  instants  n  =  Nk-i  +  1  and  n  =  Nk  —  1,  the  host  may  use  as  its  estimate  for  ail  these 
time  instants  the  most  current  host  estimate,  namely,  A[Nk  —  1]. 

Note  that  Nk-i  and  Nk  must  be  spaced  far  enough  from  one  another  so  that  the  number 
of  M- ary  symbols  used  to  describe  the  sensor  estimate  A[iVfc]  (i.e.,  Nk—Nk-i)  is  large  enough 
to  guarantee  that  the  residual  error  decays  faster  than  1  /Nk.  On  the  other  hand,  since  no 
encoded  symbols  are  to  be  communicated  between  n  =  Nk-i  +  1  and  n  =  Nk  —  1,  the  time 
instances  Nk- 1  and  Nk  should  still  be  close  enough  so  that  during  the  delay  incurred  by  this 
batch-type  scheme  the  “old”  host  estimate  A[iVfc  —  1]  remains  still  “accurate  enough” .  The 
following  theorem  describes  how  these  instants  Nk  can  be  spaced  in  time  so  as  to  guarantee 
that  the  residual  error  between  the  host  estimate  A[n]  and  the  sensor  estimate  A[n]  decays 
faster  than  1/n. 

Theorem  1  Let 


Nk+1  =  Nk  +  h[Nk]  (4.9) 

for  k  >  1,  initialized  with  N\  >  1,  and  where  h  :  N+  — »  N+.  Consider  the  encoding  strategy 
which  at  time  n  —  Nk+i  encodes  as  y[Nk  +  1],  y[Nk  +  2],  •  •  •  ,  y[jVfc+1]  the  most 

significant  symbols  in  the  base-M  representation  of  a  consistent  sensor  estimator  A[n\  from 
sn.  Let  A[iVfc+i]  denote  the  number  whose  /i[iVfc]  most  significant  symbols  in  the  base-M 
representation  is  given  by  y[Nk  + 1],  y[Nk  +  2],  •  •  •  ,  y[Nk+i]  followed  by  0’s.  If  the  function 
/»[•]  satisfies  both 

h[n] 

hrn  —  =  0,  4.10a) 

n-+oo  n 

and 


limsup  <  2  ln(M) , 

n-»oo  h\n\ 

then  the  host  estimator  A[n]  given  by 

A[n]  =  A[  max  Nk]  , 

is  asymptotically  optimal  in  the  sense  that  it  achieves  (4-3). 


(4.10b) 


(4.11) 
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Figure  4-2:  MSE  performance  of  A[n]  from  (4.11)  in  Gaussian  noise,  where  A[n]  is  the 
sample  mean  (4.12).  Simulation  parameters:  A  =  1,  A  =  0.2,  av  —  0.1. 

A  proof  of  the  asymptotic  optimality  of  the  encoding  class  of  Theorem  1  is  included  in 
App.  C.l. 

Fig.  4-2  depicts  the  MSE  performance  of  two  instances  of  the  host  estimator  A[n\  from 
(4.11)  in  the  case  that  M  =  2,  u[A;]  ~  A/"(0,  cr%),  and  where  the  estimator  based  on  sn  is  the 
sample-mean,  i.e., 

Mp]  =  l  ■  (4-12) 

n  k= l 

In  particular,  the  solid  and  the  dashed  curves  in  the  figure  depict  the  MSE  performance  of 
A[n]  in  the  two  special  cases  that  the  function  h[-]  in  (4.11)  is  given  by 

h[n]  =  |Vra]  ,  (4.13) 
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and 


h[n]  =  [2v/ln(rc)  j  ,  (4.14) 

respectively,  and  where  [x]  denotes  the  smallest  integer  that  is  greater  or  equal  to  x.  In 
both  cases  the  recursion  (4.11)  is  initialized  with  Ni  =  2.  One  can  easily  verify  that  both 
(4.13)  and  (4.14)  satisfy  (4.10).  The  dotted  line  in  the  figure  depicts  the  Cramer-Rao  bound 
B  (A;  sn)  which  in  this  Gaussian  case  is  achieved  by  the  sample-mean  estimator  (4.12)  for 
all  n.  As  the  figure  illustrates,  both  host  estimates  are  asymptotically  efficient  with  respect 
to  this  bound.  Clearly,  the  particular  choice  of  N\  and  h[-]  dictates  how  fast  the  MSE  of  the 
host  estimate  A[n]  achieves  the  bound  B  (A;  s)n.  In  general,  optimal  selection  of  N\  and 
h[-]  depends  on  the  signal-to-noise  ratio  A/<jv,  and  the  particular  sensor  noise  and  sensor 
estimate  characteristics. 

4.3  Fixed-Rate  Encodings 

Although  the  host  estimate  A[n]  in  (4.11)  is  optimal  in  the  sense  that  it  asymptotically 
achieves  the  MSE  rate  of  the  sensor  estimate  A[n]  formed  from  the  original  original  mea¬ 
surements,  the  encoded  sequence  is  not  generated  at  a  fixed  encoding  rate.  In  particular, 
delay  is  inherently  incurred  by  the  encoder  and  this  encoding  delay  increases  with  n.  In 
general,  we  may  want  to  construct  embedded  algorithms  that  generate  fixed-rate  data  en¬ 
coding  descriptions,  i.e.,  algorithms  which  provide  one  M- ary  symbol  of  the  description  for 
each  new  available  observation. 

This  problem  possesses  many  similarities  to  the  one  of  successive  refinement  of  infor¬ 
mation  [17].  In  that  problem  a  sequence  of  n  IID  random  variables  of  known  PDF  are 
observed,  and  the  task  is  to  form  a  successively  refinable  approximate  description  of  these 
n  observations  which  achieves  optimal  or  close  to  optimal  approximation  quality  at  any 
description  level,  as  measured  by  a  rate  distortion  metric.  Analogously,  in  the  problem  we 
are  addressing  in  this  section  the  PDF  of  the  IID  random  variables  s[n]  is  known  up  to  an 
uncertainty  in  the  mean.  The  task  is  to  form  a  multi-stage  encoding,  so  that  the  estimate 
sequence  A[n]  generated  from  the  nth  stage  encoding  y"  is  asymptotically  optimal,  i.e.,  the 
MSE  performance  of  A[n ]  achieves  B  (A;  sn)  for  any  n  large  enough. 

In  this  section  we  develop  fixed-rate  digital  encodings  that  result  in  asymptotically 
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efficient  estimation  with  respect  to  B  ( A ;  s).  We  focus  on  the  case  M  =  2,  although  similar 
asymptotically  optimal  schemes  can  also  be  designed  for  M  >  21. 

In  general,  an  embedded  fixed-rate  binary  encoder  is  a  rule  for  selecting  the  nth  encoded 
bit  y[n]  based  on  yn_1  and  sn.  The  approach  we  follow  in  this  section  constrains  the 
encoding  strategy  to  select  y[n ]  based  on  yn_1  and  A[n],  where  A[»]  denotes  a  rule  for 
obtaining  an  estimate  of  A  based  on  sn.  In  that  sense  the  objective  of  the  encoding  strategy 
is  to  achieve  the  performance  of  a  particular  estimator  A[n]  from  the  original  observations. 

The  motivate  the  design  of  the  encoders  we  present  in  this  section,  it  is  worthwhile  to 
revisit  the  encoders  we  have  developed  in  Chapter  2,  which  also  generate  fixed-rate  digitally 
encoded  descriptions.  As  we  have  seen,  encodings  in  the  form  of  quantizer  bias  control  via 
feedback  such  as  (2.59),  when  used  in  conjunction  with  the  associated  host  estimators  devel¬ 
oped  in  Section  2.3.3,  come  within  101og10£(A*)  dB  of  the  optimal  performance  B  (A;  s). 
For  instance,  in  the  Gaussian  scenario,  selecting 

y[n]  =  sgn  (s[n]  -  A[n  -  1])  (4.15) 

with  A[n  —  1]  given  by  the  linear-complexity  algorithm  (2.58)  results  in  a  2  dB  loss. 

A  lot  of  insight  can  be  gained  in  the  design  of  the  asymptotically  optimal  schemes  by  ex¬ 
amining  the  performance  limits  of  the  low-complexity  structure  (2.58),  originally  developed 
for  signal  estimation  from  the  encodings  generated  by  (4.15).  Specifically,  it  is  instructive 
to  consider  the  MSE  performance  of  the  host  estimate  A[n]  given  by  the  low-complexity 
algorithm  (2.58),  in  the  case  that  the  sensor  precisely  knows  the  static  signal  A,  in  which 
case  it  could  use  an  encoding  of  the  form  (4.15)  where  the  noisy  measurements  s[n]  are 
replaced  by  A.  The  MSE  performance  of  the  resulting  host  estimate  A[n]  is  described  by 
the  following  theorem  which  we  prove  in  App.  C.2. 

Theorem  2  Given  0  <  c  <  oo,  consider  the  dynamical  system 

A[n 1  =  Afn  —  1]  +  —  sgn  (A  —  A[n  —  1])  ,  (4-16) 

n 

1  In  fact,  we  can  easily  design  asymptotically  efficient  schemes  for  M  >  2,  by  trivial  extensions  of  the 
M  =  2  case,  namely,  by  exploiting  at  the  host  only  two  of  the  available  M  encoded  levels.  Although  beyond 
the  scope  of  this  thesis,  designing  algorithms  that  generate  optimized  fixed-rate  M-level  encoded  schemes  is 
clearly  a  problem  worth  further  investigation. 


initialized,  with  A[nf\,  for  some  n0  >  1.  Then 


lim  A[n ]  =  A  (4.17) 

71— >00 

for  any  n0,  any  initialization  A[n0],  and  any  A.  In  addition,  the  mean-square  difference 
between  A  and  A[n\  decays  as  1/n2.  In  particular,  for  almost  all  initial  conditions 

lim  sup  n2  |A[n]  —  A\ 2  =  c2  .  (4-18) 

n— voo 

As  suggested  in  Theorem  2,  in  the  case  that  an  error-free  estimate  is  available  and 
used  for  encoding  at  the  sensor  the  residual  error  decays  as  1/n2.  In  the  actual  setting 
where  noisy  measurements  of  A  are  instead  available  at  the  sensor,  replacing  s[n]  with  a 
sensor  estimate  A[n]  can  actually  improve  the  MSE  performance  of  the  host  estimate.  In 
particular,  we  consider  the  following  binary  encoding  method 

y[re]  =  sgn  (A[rc]  -  A[n  -  1]) ,  (4.19a) 

where  the  host  estimate  A[?z]  based  on  yn  is  given  by 

{from  look-up  table  if  n  <  n0 

~  (4.19b) 

ZA  (A[n  -  1]  +  y[n])  if  n  >  nQ 

and  where  we  are  also  interested  in  optimally  selecting  the  parameter  A.  As  in  Chapter  2, 
we  assume  that  v[n ]  =  av  t?[nj;  in  that  sense  optimal  selection  of  A  will  depend  on  pt  (•),  the 
particular  sensor  noise  generating  PDF2.  The  estimator  that  is  to  be  used  with  this  encoder 
is  given  by  (4.19b).  Note  the  similarity  of  algorithms  (4.19b)  and  (2.58)  in  terms  of  their 
dependence  on  the  sensor  noise  scaling  av. 

The  look-up  table  structure  in  (4.19b)  is  used  to  provide  faster  convergence  to  asymp¬ 
totically  optimal  behavior.  Specifically,  at  high  peak  SNR  ( i.e .,  for  A  'S?  av),  the  first 
collection  of  bits  y[l],  •  •  • ,  y[n0]  may  be  used  to  encode  a  coarse  description  of  A,  ignoring 

2More  generally,  we  may  consider  nonstationary  A’s,  i.e.,  A’s  of  the  form  A[n]  =  /( y”).  Although  proper 
design  of  the  resulting  encoders  can  potentially  further  improve  the  performance  of  the  systems  presented 
in  this  section,  its  investigation  is  beyond  the  scope  of  this  thesis. 
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the  effects  of  sensor  noise,  i.e., 

A[n]  =  A[n  -  1]  +  ^  y[rc]  for  n  <  n0  ,  (4.20) 

initialized  with  A[0]  =  0,  and  where  y[n]  is  given  by  (4.19a).  Naturally,  selection  of  a 
suitable  value  for  n0  in  (4.19b)  depends  on  A,  av,  and  the  particular  sensor  noise  PDF. 
We  must  note,  however,  that  the  proper  value  of  n0  depends  logarithmically  on  x  =  A/a, 
as  (4.20)  suggests  about  the  convergence  of  the  MSE  of  A[n]  to  that  of  A[n]  (see  also  the 
associated  discussion  in  Section  2.3.3). 

Since  we  are  primarily  concerned  with  the  performance  of  the  algorithm  (4.19)  for  large 
n,  we  focus  on  the  encoding  performance  for  n  >  n0.  The  block  diagram  for  the  sequential 
encoder  of  observations  s[n]  into  bits  y[n]  for  n  >  n0  is  shown  in  Fig.  4-3.  Intuitively, 
at  any  time  instant  n,  the  sensor  encodes  the  sign  of  the  difference  between  the  current 
sensor  estimate  A[n]  and  the  most  recent  host  estimate  A[n  -  1],  The  associated  host 
decoder/estimator  from  these  bits  is  also  shown  in  Fig.  4-4.  As  we  have  remarked,  for 
n  <  n0  both  the  decoder  and  encoder  may  employ  a  lookup  table  to  obtain  A[n). 

This  class  of  fixed-rate  binary  encoding  and  estimator  pairs  have  very  attractive  asymp¬ 
totic  properties.  In  particular,  we  may  recall  that  (4.18)  provides  a  bound  on  the  best 
possible  decay  rate  of  the  mean-square  difference  between  A[n]  and  A[n]  of  any  algorithm 
of  the  form  (4.19),  since  the  algorithm  (4.19)  encodes  an  estimate  A[n ]  rather  than  A  which 
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Figure  4-4:  Block  diagram  of  the  sequential  decoder  associated  with  the  encoder  described 
by  (4.19)  for  n  >  n0. 


was  used  in  Theorem  2  (see  (4.16)).  As  we  will  see  in  the  following  sections,  if  the  MSE  of 
A[n]  decays  as  1/n  and  if  A[n]  satisfies  a  mild  set  of  conditions,  the  resulting  residual  error 
between  the  host  and  the  sensor  estimate  decays  as  1/n2,  guaranteeing  that  the  resulting 
host  estimate  A[n]  is  asymptotically  optimal,  in  the  sense  that  it  achieves  (4.3). 


4.3.1  Gaussian  Sensor  Noise 

For  Gaussian  sensor  noise,  fixed-rate  encodings  of  the  form  (4.19)  can  be  designed  that 
possess  low  computational  complexity  and  are  asymptotically  efficient  in  the  sense  that 
they  achieve  (4.4).  Specifically,  consider  the  encodings  of  the  form  (4.19)  where  the  sensor 
estimate  A[n]  is  the  sample-mean  of  the  original  noisy  measurements,  given  by  (4.12).  Then, 
for  sufficiently  large  n  we  can  easily  show  that 

E  |^A[n]  —  A[n  —  1])  ss  /?<r2/n2  (4.21) 

for  some  /3  >  1.  In  particular,  as  shown  in  App.  C.3, 

A[n]-A[n-l\y  =  0a2v.  (4.22) 

Using  the  readily  verified  identity  satisfied  by  the  sample-mean  estimate 

E[(AW-i[n-l])2]=§(l  +  -iT),  (4.23) 

the  triangle  inequality,  and  (4.22)  we  obtain  the  following  bounds  on  the  residual  error 

(P  -  1)  a2  <  Jirr^  n2  E  -  1]  -  A[n  -  1])  <  (/?  4- 1)  tr2 .  (4.24) 


103 


Figure  4-5:  Block  diagram  of  the  sequential  encoder  (4.12)-(4.19b)  for  n  >  n0,  for  asymp¬ 
totically  efficient  estimation  in  white  Gaussian  noise. 

The  set  of  Ineqs.  (4.24)  implies  that  the  residual  between  A[n]  and  A[n]  decays  as  1/n2. 
Hence,  since  the  sample-mean  A[ri\  is  an  efficient  estimator  in  additive  IID  Gaussian  noise, 
we  have 


B(A;  s)  <  lim  n  E  [(A[n]  -  A)2 

71— >00 


< 

< 


B(A;  s)  , 


+  E 


(iM-4)2]} 


which  reveals  that  the  host  estimate  A[ri\  formed  from  yn  is  asymptotically  efficient  with 
respect  to  B  {A;  sn). 

In  this  case,  the  block  diagram  of  the  sequential  encoder  of  the  original  noisy  observations 
s[n]  into  bits  y[n\  for  n  >  n0  specializes  to  the  low-complexity  structure  shown  in  Fig.  4-5. 
The  associated  decoder/estimator  from  these  bits  is  also  shown  in  Fig.  4-4.  For  n  <  n0  both 
the  decoder  and  encoder  may  obtain  A[n\  by  means  of  the  same  lookup  table.  Again,  in  the 
coarse  stage  of  the  description  ( i.e .,  n  <  n0)  the  residual  error  between  the  two  estimates 
decays  exponentially  with  n  since  the  noise  level  is  small  compared  to  the  dynamic  range. 

Although  the  host  estimate  A[n]  obtained  via  (4.19)  and  (4.12)  is  asymptotically  efficient 
for  any  A  >  0  in  (4.19b),  we  can  choose  the  value  of  A  so  as  to  minimize  the  residual  error 
given  by  (4.21).  Specifically,  as  we  show  in  App.  C.3,  /?  in  (4.21)  and  A  in  (4.19b)  are  related 
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Figure  4-6:  Resulting  residual  error  scaling  /3  as  a  function  of  parameter  A  in  (4.19b). 


as  follows 


/3(A)  = 


7T  (1  +  A2)2 
8A2  ’ 


so  that 


(4.25) 


min  /3(A)  =  7r/2, 

which  is  achieved  for  A  =  1.  Fig.  4-6  depicts  the  residual  error  dependency  on  the  value  of 
A  chosen.  As  (4.25)  reveals,  selecting  a  value  of  A  an  order  of  magnitude  larger  or  smaller 
that  the  optimal  value  results  in  increasing  the  residual  error  by  about  14  dB  for  any  given 
n. 

Fig.  4-7  illustrates  the  validity  of  our  analysis  for  this  Gaussian  scenario  by  means  of 
Monte-Carlo  simulations  for  a  specific  example  where  A  =  1,  av  =  1,  A  =  0.2,  and  A  =  1. 
The  dotted  line  in  Fig.  4-7(a)  represents  the  Cramer-Rao  bound  B  (A;  sn)  while  the  solid 
curve  depicts  the  MSE  in  the  host  estimate  A[n]  as  a  result  of  Monte-Carlo  simulations. 
Fig.  4-7(b)  depicts  the  associated  residual  error  (4.21)  from  simulations  (solid)  and  the 
estimate  of  the  residual  error  (dashed)  obtained  via  (4.21)  and  (4.25). 

4.3.2  Robust  Encodings  in  NonGaussian  Finite- Variance  Noise 

The  low-complexity  encoding  method  consisting  of  (4.19)  and  the  sample-mean  A[re]  in 
(4.12)  is  very  robust  with  respect  to  variations  in  the  sensor  noise  PDF.  In  particular,  it 
achieves  similar  performance  characteristics  when  the  sensor  noise  v[n]  is  an  IID  finite- 


105 


(a)  Mean-square  estimation  error  in  A[n] 


(b)  Residual  error 

Figure  4-7:  Performance  of  A[n ]  from  (4.19b),  where  y[n]  is  given  by  (4.19a)  and  A[n]  is 
the  sample  mean  (4.12). 

variance  nonGaussian  admissible  process,  as  we  now  show. 

For  convenience,  we  assume  that  v[ri\  is  a  unit-variance  distribution,  in  which  case  a% 
equals  the  variance  of  the  sensor  noise  v[n].  Without  loss  of  generality  we  consider  the 
case  where  the  sensor  noise  is  zero-mean,  in  which  case,  as  is  well  known,  the  sample- 
mean  A[n\  from  (4.12)  forms  a  consistent  estimator  of  A  with  MSE  equal  to  o'ljn.  The 
method  that  we  used  in  App.  C.3  to  show  that  the  host  estimate  A[n]  of  the  previous 
section  has  asymptotic  MSE  equal  to  cr%/n  applies  exactly  to  this  case  as  well;  that  is,  the 
encoder/estimator  structure  described  by  (4.12)  and  (4.19)  provides  a  host  estimate  A[n] 
with  asymptotic  MSE  equal  to  the  noise  power  level  divided  by  the  available  number  of 
observations.  Conveniently,  the  encoder  and  the  decoder  do  not  even  require  knowledge  of 
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crv  to  obtain  an  estimate;  it  is  simply  required  that  both  the  encoder  and  decoder  use  the 
same  system  parameters,  i.e.,  the  same  lookup  table  and  value  of  A  crv  in  (4.19b).  However, 
knowledge  of  av  can  be  exploited  to  provide  faster  convergence  to  asymptotic  efficiency  via 
optimal  selection  of  A  av. 

Although  attractive  due  to  its  simplicity,  this  approach  does  not  always  lead  to  a  better 
asymptotic  MSE  than  the  quantizer  bias  control  encoding  approach  described  by  (4.15)  and 
(4.19b).  This  is  clearly  illustrated  in  the  special  case  where  the  sensor  noise  is  Laplacian, 
i.e., 


V2  crv 

The  Cramer-Rao  bound  for  estimating  A  from  s[n]  can  be  obtained  by  partial  differentiation 
of  the  log-likelihood  function  followed  by  an  expectation  and  is  given  by 

B  (A;  s-)-f|.  (4/2, 

l  n 

Note  that  in  this  case  the  sample-mean  A[n]  is  not  asymptotically  efficient  with  respect  to 
B  (A;  sn)  from  (4.26),  since  it  incurs  a  10  log10  2  w  3  dB  loss.  Hence,  the  encoder/estimator 
structure  (4.19)  that  operates  on  the  sample-mean  A[n]  incurs  a  3  dB  loss.  Alternatively, 
consider  using  the  quantizer  bias  control-based  encoder/estimator  structure  described  by 
(4.15)  and  (4.19b).  The  associated  information  loss  (2.4)  in  the  case  of  the  Laplacian  PDF 
is  minimized  at  A.  =  0;  by  using  the  expression  for  B  (A;  y")  given  by  (2.15)  with  a  —  v, 
we  obtain 


£(A«)  = 


ff(Q;  y) 
*72 


=  1. 


Interestingly,  at  A,  =  0  this  quantizer  bias  control  encoder  with  feedback  incurs  no  in¬ 
formation  loss.  Hence,  in  the  Laplacian  case  we  may  expect  the  quantizer  bias  control- 
based  method  described  by  (4.15)  and  (4.19b)  to  asymptotically  outperform  the  encod¬ 
ing/estimation  set  (4.12)-(4.19b). 

This  is  indeed  the  case  as  demonstrated  in  Fig.  4-8  where  we  depict  the  MSE  performance 
of  these  two  methods  in  the  Laplacian  case,  for  A  =  1,  ov  =  0.1,  along  with  B  (A;  sn) 
(lower  dotted  line)  and  the  MSE  of  the  sample  mean  (upper  dotted  line).  As  we  can  see, 


107 


Figure  4-8:  MSE  performance  of  the  host  estimator  in  Laplacian  sensor  noise.  The  sensor 
estimate  encoded  in  each  case  is  the  sample-mean  (solid),  the  sensor  measurement  s[n] 
(dash-dot),  and  the  ML  estimate  (dashed).  The  two  dotted  lines  depict  the  Cramer- Rao 
bound  for  estimating  A  given  s”  (lower)  and  cr2/n  (upper). 


the  method  encoding  the  difference  between  the  sample  mean  and  the  current  host  estimate 
(solid  curve)  asymptotically  achieves  the  sample-mean  MSE  rate  ( i.e .,  a  3  dB  loss),  whereas 
the  quantizer  bias  control  method  encoding  the  difference  between  s[n]  and  A[n  —  1]  (dash- 
dot  curve)  leads  to  an  estimate  that  is  asymptotically  efficient  with  respect  to  the  original 
observation  sequence  s[n]. 


4.3.3  NonGaussian  Admissible  Noise 

Whenever  the  sensor  can  form  an  estimate  A[n\  from  s”  for  which  the  mean-square  difference 
between  the  successive  estimates  A[n ]  and  A[n  —  1]  decays  as  1/n2,  the  encoder /estimator 
structure  (4.19)  can  be  used  to  provide  a  host  estimate  A[n]  whose  asymptotic  MSE  equals 
that  of  v4[n].  In  particular,  if  the  mean-square  difference  between  successive  sensor  estimates 
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A[n]  and  A[n  —  1]  decays  as  1/n2,  i.e.,  if 


lim 

n— >-oo 


n2  E  [e2[n]]  =  7  <r2  , 


(4.27) 


where  0  <  7  <  00  and 


c[n]  =  A[ra]  —  A[n  —  1] , 


(4.28) 


then,  as  shown  in  App.  C.3,  the  residual  error  between  the  sensor  estimate  A[n]  and  the 
associated  host  estimate  A[n  -  1]  has  the  form  (4.21),  implying  that  the  asymptotic  MSE 
of  A[n\  is  the  same  as  the  one  corresponding  to  A[n].  The  optimal  value  of  A  in  (4.19)  can 
be  found  by  minimizing  the  associate  /3(A)  in  (4.21);  specifically,  as  we  also  show  in  the 
appendix  we  have 


m  « 


7T  (7  +  A2)2 
8A2  ’ 


(4.29) 


so  that 


min  /3(A)  as  -kj/2, 

A 


which  is  achieved  for  A  =  y/j. 

Under  a  mild  set  of  conditions  on  the  sensor  noise  PDF,  the  ML  estimator  AmlM 
based  on  observation  of  s",  has  the  property  that  it  is  asymptotically  efficient  with  respect 
to  B(A;  sn),  asymptotically  Gaussian  distributed,  and  also  satisfies  (4.27)  [12].  In  these 
cases,  when  the  sensor  estimate  computed  is  the  ML  estimate  AmlM  formed  from  sn,  the 
block  diagrams  in  Figs.  4-3  and  4-4  describe  a  general  algorithmic  method  for  obtaining  an 
asymptotically  efficient  encoding. 

Fig.  4-8  also  depicts  the  MSE  performance  of  the  host  estimator  of  the  method  (4.19)  in 
the  case  that  the  sensor  estimate  A[n]  is  the  ML  estimate,  which  in  thb  Laplacian  scenario 
is  the  median  of  the  n  observations  s[l],  s[2],  •  -  -  ,  s[n]  and  is  asymptotically  efficient  with 
respect  to  B  (A;  s)  / n.  As  the  dashed  curve  in  the  figure  reveals,  the  associated  host  estimate 
A[ n]  based  on  the  encodings  is  also  asymptotically  efficient. 


4.3.4  Uniformly  Distributed  Noise 

As  we  have  already  mentioned  in  Section  4.3.2,  the  estimator/detector  structure  described 
by  (4.19)  possesses  remarkable  robustness.  As  an  illustration  of  this  fact,  in  this  section  we 
consider  estimation  in  IID  uniformly  distributed  noise.  In  this  case,  the  first-order  PDF  of 
v\n\  is  given  by 


Pv  (v) 


T7TTV  if  M  <  yfivv 

0  otherwise 


This  noise  process  does  not  belong  to  the  admissible  class  we  have  defined  in  Chapter  2.  As 
is  well  known,  a  Cramer-Rao  bound  for  this  estimation  problem  does  not  exist,  consistent 
with  the  fact  that  there  exist  estimators  A[n\  of  A  based  on  sn,  whose  MSE  decays  faster 
1  [n.  For  instance,  the  MSE  of  the  following  estimator 


A[n}  = 


max{s[l],  s[2],  ■  ■  ■  ,  s[n]}  +  min{s[l],  s[2],  •  •  •  ,  s[n]} 

2 


(4.30) 


decays  as  1  /n2: 


E 


(^H  -  a) 


6  <r 


2 

v 


(n  +  l)(ra  +  2) 


(4.31) 


Even  though  the  residual  error  between  A[n]  and  A[n]  from  the  encoder/estimator  pair 
(4.19)  decays  at  best  as  fast  as  1/ra2  (see  Theorem  2),  by  proper  choice  of  A  the  pair  (4.19) 
we  can  effectively  achieve  the  performance  of  A[vl\  in  (4.30),  as  we  demonstrate  next. 

The  simulated  MSE  performance  of  the  host  estimate  from  (4.19)  for  A  =  1  and  where 
the  sensor  estimate  A[n ]  is  given  by  (4.30)  is  depicted  in  Fig.  4-9.  Note  that  since  the 
variance  of  e[n]  defined  in  (4.28)  decays  faster  than  1/n2,  7  from  (4.27)  equals  zero.  Since 
for  any  A  >  0  the  asymptotic  residual  error  scaling  /?  is  given  by  (4.29)  for  7  =  0,  in  this 
example  we  have  /?  =  tt/8.  Consequently,  the  asymptotic  MSE  of  the  host  estimate  A[n] 
can  be  approximated  as 

E  J^(A[n]  -  A)2]  <  E 

<  (6  +  jt/8)  <r2Jn2 .  (4.32) 


^A[n  +  1]  -  A[n] j  +  E  ^A[n  +  1]  - 


•m 
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Figure  4-9:  The  dash-dot  and  solid  curves  show  the  host  estimate  MSE  in  uniformly  dis¬ 
tributed  sensor  noise,  when  the  sample-mean  and  the  estimator  (4.30),  respectively,  are 
encoded  at  the  sensor.  For  reference,  the  bound  (4.32),  CTyfn,  and  the  MSE  of  A[n]  in 
(4.30)  are  depicted  by  the  lower  dotted,  upper  dotted,  and  dashed  curve,  respectively. 


Combining  (4.32)  and  (4.31)  suggests  that  the  encoder/decoder  pair  described  by  (4.19) 
and  (4.30)  incurs  an  asymptotic  processing  loss  over  the  sensor  estimate  A[n]  from  (4.30) 
that  is  about 


^Proc(-A)  ~  1  + 


7T 

48  ’ 


corresponding  to  only  about  0.28  dB. 


4.4  Network  Extensions 

Multi-sensor  extensions  of  all  the  preceding  single-sensor  encoding  and  estimation  algo¬ 
rithms  can  be  constructed  that  retain  the  asymptotic  optimality  properties  of  the  original 
single-sensor  schemes.  As  in  Chapter  3,  we  assume  that  Sf[n],  the  nth  observation  collected 
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at  the  £th  sensor,  is  given  by 


s^[n]  =  A  +  ve[n\ , 


where  the  sequences  v*[n]’s  are  independent  IID  noise  sequences. 

By  designing  the  ith  encoder  according  to  the  single-sensor  principles  and  then  properly 
combining  the  L  symbol  streams  we  can  obtain  asymptotically  optimal  estimates.  As  an 
illustration  of  the  design  of  such  multi-sensor  extensions,  we  briefly  consider  fixed-rate 
encodings.  Let  yt[n ]  denote  the  sequence  encoded  at  the  ith.  sensor,  A([n]  denote  the 
asymptotically  optimal  host  estimate  resulting  from  using  the  encoding  strategy  (4.19)  on 
the  consistent  sensor  estimate  A([n ]  formed  at  the  £th  sensor  from 

T 

S?  =[«/[!]  st[2]  •••  st[n]]  ■ 


The  £th  encoder  is  depicted  in  Fig.  4-3,  with  s[n],  y[n],  A[n],  and  A[n]  replaced  by  s*[n], 
ye[n ],  At[n]i  and  A([n],  respectively. 

For  simplicity  and  without  loss  of  generality  we  consider  the  case  where  the  original 
A^nj’s  are  asymptotically  efficient  with  respect  to  B  ( A ;  s”).  In  that  case,  the  estimate 


1 

Efei  \B  (0;  S,)]'1 


E 


A*  [ft] 

B  (0;  st) 


(4.33) 


(where  At[n]  is  the  estimate  formed  solely  from  the  encodings  of  the  £th  sensor)  provides 
an  asymptotically  efficient  estimator  of  A  from  s" ,  s^,  •  ,  s£.  In  the  general  case  where 

the  Ae[n]’s  are  consistent  but  not  necessarily  efficient  estimates  with  known  MSE  rates  that 
are  independent  of  the  unknown  parameter  A ,  A[n]  from  (4.33)  provides  an  asymptotically 
optimal  estimate  provided  we  replace  B  (0;  s^)  with  E  j^(A;[n]  —  A)2 j . 

Finally,  in  the  special  case  that  the  PDFs  of  the  sensor  noises  are  identical  i.e.,  pV(  (x)  = 
pv  (x)  almost  everywhere,  the  host  estimator  (4.33)  reduces  to 


A[n]  =  A[n  -  1]  +  ^  ^  ye[n]  , 


e=i 


for  n  >  n0,  and  where  we  also  used  (4.19b).  This  decoder  is  also  depicted  in  Fig.  4-4 
provided  we  replace  y[n]  with  Ylf=i  Vl[n]/L. 


Chapter  5 


Encoding  and  Estimation  with 
Quantizer  Bias  Control: 

Time- Varying  Case 


In  Chapters  2  and  3  we  have  focused  our  attention  on  estimating  a  static  signal  from 
noisy  measurements  in  the  context  of  encoders  composed  of  a  control  input  added  to  each 
measurement  prior  to  quantization.  We  have  developed  optimized  encodings  of  the  noisy 
measurements  into  digital  sequences  and  asymptotically  efficient  estimators  from  these  en¬ 
codings  for  a  number  of  scenarios  of  practical  interest.  Although  our  static  case  analysis 
has  revealed  a  number  of  key  characteristics  of  this  signal  estimation  problem,  the  systems 
we  have  designed  prove  inadequate  in  cases  where  the  information-bearing  signal  varies  suf¬ 
ficiently  fast  to  render  the  static  signal  assumption  invalid  across  the  observation  interval 
used  to  form  the  estimates;  in  designing  encoding  strategies  for  the  general  time-varying 
case,  we  generally  need  to  take  into  account  the  information-bearing  signal  characteristics, 
namely,  the  signal  model  and  dynamics.  However,  as  we  show  in  this  chapter,  for  a  partic¬ 
ular  class  of  time- varying  extensions,  we  can  develop  a  rich  class  of  encoding  strategies  and 
signal  estimators  by  building  on  the  principles  that  we  have  developed  for  the  static  case. 

In  this  section  we  develop  generalizations  of  the  framework  we  have  developed  in  Chap¬ 
ters  2  and  3  that  encompass  a  number  of  time- varying  information-bearing  signals.  Through¬ 
out  this  chapter  we  focus  on  information-bearing  signals  and  sensor  noises  that  are  well 
modeled  as  Gaussian  processes.  In  the  general  time-varying  case,  we  usually  have  to  rely 


on  coarse  measurements  from  multiple  sensors  to  obtain  accurate  signal  estimates.  This  is 
clearly  illustrated  by  considering  the  extreme  case  where  the  information-bearing  signal  is 
well  modeled  as  an  IID  Gaussian  process,  and  where  the  host  is  faced  with  the  problem 
of  estimating  such  a  signal  from  encoded  bits  collected  from  a  single  sensor  that  measures 
this  signal  in  statistically  independent  IID  sensor  noise.  Since  the  information-bearing  sig¬ 
nal  and  the  sensor  noise  are  independent  IID  processes,  for  any  fixed-rate  binary  encoding 
scheme  (such  as  encoders  of  the  form  of  quantizer  bias  control) ,  at  any  given  time  instant 
n,  the  encoded  bit  can  only  provide  information  about  the  current  signal  sample.  Further¬ 
more,  since  past  and  future  encodings  do  not  provide  any  information  for  estimating  the 
current  signal  sample,  the  problem  of  estimating  any  signal  sample  from  the  whole  encoded 
sequence  reduces  to  estimating  the  associated  Gaussian  random  signal  variable  by  observ¬ 
ing  a  single  encoded  bit.  Clearly,  the  ability  of  the  host  to  estimate  this  Gaussian  random 
variable  based  on  a  single  bit  is  severely  limited. 

To  overcome  this  problem,  in  this  chapter  we  focus  on  signal  estimation  based  on  data 
collected  from  a  network  of  sensors,  each  encoding  one  bit  of  information  per  measurement. 
In  particular  we  focus  on  the  special  case  where  the  information-bearing  signal  is  perfectly 
correlated  spatially  over  the  sensor  network,  i.e.,  at  any  time  instant  all  sensors  observe  the 
same  signal  (in  noise)1.  In  addition  we  assume  that  the  sensor  noise  samples  are  independent 
in  both  time  and  space. 

In  Section  5.1  we  present  the  class  of  time-varying  signal  models  that  we  consider  in 
this  chapter.  In  Section  5.2  we  state  the  figures  of  merit  that  we  use  to  construct  encoders 
and  estimators  for  this  class  of  time-varying  signals.  In  Section  5.3  we  present  a  number 
of  methods  that  can  be  used  to  encode  the  noisy  measurements  at  each  sensor  into  bit 
streams.  In  Section  5.4  we  sketch  some  of  the  methods  that  can  be  used  to  estimate  the 
underlying  information-bearing  signal  by  intelligently  fusing  these  bit  streams  at  the  host. 
Finally,  In  Section  5.5  we  consider  an  example  involving  a  simple  signal  model,  which  we  use 
as  a  vehicle  for  illustrating  the  design  and  the  performance  characteristics  of  the  schemes 
presented  in  Sections  5.3 — 5.4. 


’We  may  also  want  to  consider  the  dual  problem,  where  the  information- bearing  signal  is  static  in  time, 
but  partially  correlated  across  the  sensor  network.  Some  of  our  analysis  in  this  chapter  carries  through 
in  this  case,  with  appropriate  modifications.  Although  beyond  the  scope  of  this  thesis,  a  very  interesting 
problem  worth  further  investigation  corresponds  to  the  case  where  the  signal  samples  tire  partially  correlated 
in  time  and  space.  Indeed,  a  number  of  very  interesting  decentralized  data  fusion  problems  arise  in  that 
context;  see  [19,  10,  5,  6]  and  the  references  therein. 


5.1  System  Model 


Throughout  this  chapter  we  focus  our  attention  on  the  problem  of  estimating  a  single 
information-bearing  signal  A[ri\  given  by 

A[n]  =  qTx[n]  ,  (5.1a) 


where 


x[n  —  1] 


is  a  state-space  vector  that  obeys  the  following  dynamics 


(5.1b) 


x[n]  =  G  x[n  —  1]  +  h  w[n]  , 


(5.1c) 


and  where  G  is  a  known  Rx  R  matrix,  h  is  a  known  fixl  vector,  and  u[n]  is  a  zero-mean 
IID  Gaussian  process  of  variance  a\. 

The  linear  state-space  model  (5.1)  describing  the  dynamics  of  the  information-bearing 
signal  is  fairly  general  and,  as  is  well  known,  encompasses  a  number  of  broadly  used  signal 
models,  including  the  autoregressive  (AR),  the  moving-average  (MA),  and  the  autoregres¬ 
sive  moving-average  (ARMA)  model  [1].  For  instance,  an  i?-th  order  AR  model  of  the 
form 

R 

A[n]  =  a,i  A[n  —  i]  +  u[n] 
i= 1 

can  be  readily  described  via  (5.1)  by  letting 

t  r  r 

q  =  [i  0ixi? 


[hjj  =  1,  and 
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We  consider  an  L-sensor  scenario  according  to  which  the  nth  measurement  at  the  fth 
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sensor  is  given  by 


s*[n]  =  A[n\  +  V([n) ,  (5.2) 

where  the  sensor  noise  sequences  ve[n ]  are  statistically  independent  zero-mean  IID  Gaussian 
processes  with  variance  er%,  independent  of  the  information-bearing  signal  A[n],  At  time  n, 
the  tth  sensor  encodes  the  measurement  s^[n]  by  means  of  quantizer  bias  control,  i.e., 

ye[n]  =  sgn  (sf[n]  +  uv[ra]) ,  (5.3) 

where  yt[n\  and  W([n ]  denote  the  encoded  bit  and  the  control  input  used  at  the  £th  sensor 
at  time  n,  respectively.  For  compactness,  we  rewrite  the  above  encoding  equation  (5.3)  in 
matrix  form  as 

2/i  N 
2/2  N 

yL[n\ 

Our  objective  is  to  design  the  control  inputs  we[n\  used  at  the  sensors  and  the  associated 
estimators  at  the  host  based  on  the  state-space  model  given  by  (5.1),  (5.2)  and  (5.4),  so  as 
to  enable  the  host  to  obtain  accurate  signal  estimates. 


sgn  (si  [n]  -I-  wi[nj) 
sgn  (s2[n]  +  w2[n]) 

sgn  (si,[n]  +  wL[n]) 


5.2  Performance  Measures 


Consistent  with  our  previous  developments,  to  address  the  quality  of  the  encoding  and 
the  associated  estimator  we  compare  its  performance  against  the  one  from  the  original 
measurements  in  the  form  of  s^jrc],  i.e., 


«i[»] 

ui[n] 

S2M 

=  lixiqTx[n]  + 

V2[n] 

SL[n}_ 

yL[n)_ 

(5.5) 
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In  particular,  to  design  of  the  encoder  at  any  given  time  instant  n  we  use  as  performance 
metric  the  encoding  (information)  loss  associated  with  estimation  of  A[ri\  via 


A 

y[nJ  =  [yiM  ife[»]  ■■■  VL[n ] 


(5.6) 


instead  of 


5M  =  si[n]  s2W  •••  sL[n] 


(5.7) 


Since  we  are  interested  in  minimizing 


N 

MSE  (N)=jJ2E  (4”]  -*[»]) 


n=l 


(5.8) 


to  construct  the  encodings  we  use  as  our  criterion  average  rather  than  worst-case  (informa¬ 
tion  loss)  performance.  We  use  as  our  figure  of  metric  for  designing  the  encodings  at  time  n 
the  average  information  loss  in  terms  of  estimating  A[n\  based  on  observation  of  the  Ixl 
vector  y[n]  instead  of  the  vector  s[nj: 


£(A[n];  n) 


a  B  (A[n],  y[nj) 
B{A[n],  s[n])  * 


(5.9) 


Since  the  sensor  noise  sequences  are  independent  IID  processes,  minimizing  the  encoding 
loss  (5-9)  for  each  n,  also  minimizes  the  average  encoding  loss  over  all  n.  Similarly,  to 
assess  the  performance  of  the  estimator,  we  use  as  our  figure  of  metric  the  average  MSE 
loss,  defined  as  the  MSE  performance  (5.8)  based  on  the  sequence  {y[rc]}£Li  divided  by  the 
associated  MSE  performance  via  the  sequence  {sfn]}^-!. 

As  in  the  static  case,  both  the  design  and  performance  of  the  systems  employing  quan¬ 
tizer  bias  control  is  dictated  by  the  available  freedom  and  the  processing  complexity  in 
forming  tc[n],  as  well  as  the  number  of  sensors  in  the  network.  However,  in  this  time- varying 
case  the  encoding  performance  of  (5.4)  also  depends  on  the  particular  signal  characteristics 
(5.1).  As  we  show,  however,  in  many  cases  of  practical  interest  there  exist  naturally  suited 
measures  of  SNR  that  can  be  used  to  describe  the  encoding  and  estimation  performance. 
In  particular,  due  to  the  perfect  spatial  correlation  of  the  signal  across  the  sensor  sensor,  to 
design  the  encoding  at  any  time  instant,  it  is  convenient  to  view  these  encodings  obtained 
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from  all  the  sensors  in  the  network  as  being  equivalent  to  a  temporal  sequence  of  encodings 
of  a  static  signal  obtained  from  a  single  sensor.  We  first  consider  the  design  of  the  encoder 
and  consequently  address  the  estimation  problem. 


5.3  Encoding  Algorithms 

In  this  section  we  focus  on  designing  encoding  strategies  based  on  quantizer  bias  control 
characterized  by  the  control  sequences  u?f[n].  As  we  demonstrate,  we  can  build  on  the  prin¬ 
ciples  we  developed  for  the  static  case  to  develop  a  rich  class  of  efficient  encoding  strategies 
for  time-varying  signals.  We  next  develop  encoders  employing  pseudo-noise  control  inputs, 
control  inputs  based  on  feedback  and  finally,  combinations  of  pseudo-noise  and  feedback; 
similar  strategies  can  be  developed  for  control  inputs  known  to  the  host,  as  well  as  any 
combinations  thereof  with  pseudo-noise  and  feed  back- based  control  inputs. 

5.3.1  Pseudo-noise  Control  Inputs 

In  this  section  we  consider  the  case  where  the  control  input  sequences  tui[ra],  u^jn],  •  •  • ,  wi[n] 
in  (5.3)  are  statistically  independent  IID  Gaussian  processes,  each  with  power  level  cr£ .  The 
objective  is  to  select  the  pseudo-noise  power  level  so  as  to  minimize  the  average  infor¬ 
mation  loss  of  the  form  (5.9)  that  occurs  when  estimating  A[n ]  (for  a  fixed  n )  based  on  the 
txl  vector  y[ra]  in  (5.6)  instead  of  s[n]  in  (5.7),  and  where  ye[n\  denotes  the  output  of  the 
encoder  (5.3)  using  binary  quantizer  bias  control  on  si[ri\  given  by  (5.2). 

The  dual  interpretation  of  the  signal  encodings  obtained  at  a  given  time  instant  from  the 
sensor  network  as  a  temporally  encoded  sequence  of  a  static  signal  obtained  from  a  single 
sensor  is  extremely  convenient,  since  it  readily  allows  us  to  exploit  the  encoding  principles 
we  developed  for  the  static  case.  As  expected  from  the  static  case  analysis,  at  any  given 
time  n,  for  pseudo-noise  control  inputs  the  optimal  pseudo-noise  level  and  the  associate 
encoding  performance  are  also  functions  of  the  signal  power  level,  namely, 

<4H  =  var  (^N) . 

and  the  sensor  noise  level  In  particular,  following  the  analysis  for  pseudo-noise  con¬ 
trol  inputs  presented  in  Section  3.2.2,  the  average  information  loss  (5.9)  for  a  given  sig¬ 
nal  strength  a  a [n] ,  sensor  noise  level  crv,  and  pseudo-noise  level  aw  can  be  denoted  as 
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£  (<TA[n],  crv,  aw)  and  is  thus  given  by  (3.13),  where  the  average  encoding  performance  is 
given  by  B  (<7,4  [n],  av,  aw)  ,  defined  in  Section  3.2.2. 

Note  that  since  the  system  (5.1)  is  LTI,  A[n\  is  a  zero-mean  Gaussian  random  variable 
whose  variance  converges  to  a  constant  o\  as  n  — >•  00.  Thus,  in  steady  state  £  (<7aM,  avi  &w) 
is  independent  of  n.  We  are  interested  in  the  steady-state  solution,  i.e.,  we  wish  to  select 
aw  so  as  to  minimize  the  associated  average  information  loss,  i.e., 

cr°pt  =  argmin  £  (aA,  av,  aw)  .  (5.10) 

&  xu 

The  optimal  steady-state  pseudo-noise  level  is  then  readily  given  by  (3.22)  where  cropt(l)  is 
given  by  (3.20).  The  associated  optimal  average  information  loss  is  then  given  by  £pn  (x) 
from  (3.23)  where 


_  a  a A 


Comparison  of  (3.23)  and  (3.24)  reveals  that  proper  use  of  pseudo-noise  across  the  network 
improves  the  encoding  efficiency  over  simply  quantizing  the  measurements  especially  at  high 
SNR  x;  in  particular,  for  large  x  the  information  loss  (5.9)  can  be  made  to  grow  as  slow  as 
X2  by  proper  selection  of  the  pseudo-noise  power  level. 


5.3.2  Encodings  Based  on  Feedback 

Similarly,  we  may  consider  cases  where  feedback  from  the  host  to  each  sensor  in  the  network 
is  available,  and  takes  the  form 


tuf[n]  =  u)[n]  =  /(y[&];  k  <  n)  .  (5-11) 

We  would  like  to  determine  the  performance  limits  of  these  strategies,  and,  in  addition,  to 
select  /(•)  so  as  to  optimize  the  quality  encodings. 

To  assess  the  performance  limits  of  feedback  methods  it  is  convenient  to  consider  the 
following  alternative  description  of  A[n ]  in  (5.1) 

A[n]  =  q T  G  x[n  —  1]  +  qT  h  u[n]  .  (5.12) 
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As  we  may  recall  from  the  static  case  analysis,  the  encoding  performance  is  optimized  if 
the  encoder  operates  close  to  the  quantizer  threshold;  ideally,  we  would  like  to  use  a  control 
input  w[n]  based  on  past  encoded  values  that  is  as  close  to  —A[n]  as  possible.  However, 
since  feedback  at  time  n  can  only  depend  on  past  observed  encoded  bits,  i.e.,  y[fc]  for  k  <  n, 
we  can  only  hope  to  accurately  predict  the  component  q T  Gx[n  —  1]  of  A[n]  in  (5.12);  the 
term  qrhu[ra]  in  (5.12)  can  not  be  predicted  (and  “subtracted”  off)  via  feedback  from  past 
encodings,  since  this  signal  component  is  statistically  independent  of  all  past  observation, 
i.e.,  independent  of  all  s[&]  for  k  <  n. 

Assuming  that  the  term  qrGx[n  —  1]  can  be  accurately  predicted  via  feedback  and 
subtracted  from  the  measurement  at  each  sensor,  at  any  time  instant  n  the  information 
loss  across  the  array  is  governed  by  the  unpredictable  component  qThu[nj.  Consequently, 
the  information  loss  in  the  encodings  via  feedback  is  lower  bounded  by  £free  (xib)  given  by 
(3.24),  where 


au 

Xfb  =  — 

(Xy 


(5.13) 


and  where  er'2  is  the  power  level  of  the  term  qr  h  u[n]  of  A[n]  in  (5.12)  (which  cannot  be 
predicted  from  the  past) 


aJ,=<7uqThhTq.  (5-14) 

The  accuracy  within  which  we  can  approach  the  bound  £free  (xfb)  depends  on  how  accurately 
we  can  estimate  the  term  q TGx[n  —  1]  of  Ajra]  in  (5.12)  based  on  past  observations.  Since 
the  estimate  improves  with  the  number  of  observations,  we  can  get  arbitrarily  close  to  this 
bound  provided  we  have  enough  spatial  observations  for  each  n,  i.e.,  provided  that  L  is 
large  enough. 

For  small  enough  feedback  SNR  Xfb  in  (5.13),  encodings  of  the  form  (5.11)  may  incur 
only  a  small  loss  in  performance.  However,  for  high  Xfb  performance  degrades  very  rapidly 
as  (3.24)  reveals.  For  this  reason,  we  may  consider  joint  use  of  feedback  and  pseudo-noise 
to  improve  performance  for  large  Xfb- 
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5.3.3  Joint  Use  of  Pseudo-noise  and  Feedback 


Joint  use  of  pseudo-noise  and  feedback  can  provide  significant  performance  improvements 
over  using  feedback  or  pseudo-noise  alone.  In  this  section  we  consider  control  inputs  of  the 
form 


W([n]  =  Wfb[n]  +  Oyj  [n]  (5.15a) 

where  the  sequences  ii^[n]  are  statistically  independent  IID  zero-mean  unit-variance  Gaus¬ 
sian  processes,  and  the  feedback  sequence  ttffbjn]  is  to  be  properly  selected  as  a  function  of 
all  past  observations,  t.e., 


wfbM  =  /(y[fcj;  k  <  n)  .  (5.15b) 

According  to  our  development  in  the  preceding  section,  we  may  use  the  feedback  term 
(5.15b)  to  predict  and  “cancel”  out  the  term  qrGx[n  —  1]  from  A[n],  and  then  use  pseudo¬ 
noise  term  to  optimize  the  encoding  in  terms  of  the  novel  term.  Provided  that  there  are 
enough  sensors  to  form  accurate  estimates  of  q T  G\[n  —  1]  from  y[fc]  for  k  <  n,  to  minimize 
the  encoding  loss  we  simply  have  to  select  the  pseudo-noise  level  ow  according  to  (3.22) 
for  a  a  replaced  by  a'u.  The  resulting  encoding  strategy  (5.15)  can  be  used  to  achieve 
encodings  whose  information  loss  (5.9)  grows  as  slow  as  quadratically  with  Xfb  in  (5.13); 
specifically,  at  high  SNR  Xfb  the  encoding  performance  is  given  by  (3.19)  for  x  replaced  by 
Xfb-  As  expected,  joint  use  of  pseudo-noise  and  feedback  provides  advantages  over  using  only 
feedback  or  only  pseudo- noise  separately;  since  Xfb  <  x  and  since  £pn  (•)  is  an  increasing 
function  of  its  argument  we  have 


£pn  (Xfb)  <  £pn  (x)  ,  (5.16) 

revealing  that  the  encoding  performance  of  networks  exploiting  feedback  and  pseudo-noise 
is  in  general  superior  to  that  of  networks  exploiting  pseudo-noise  alone.  In  fact,  since  £pn  (•) 
is  a  strictly  increasing  function,  equality  in  (5.16)  is  achieved  if  and  only  if  Xfb  =  X>  if 
and  only  if  A[n]  is  an  IID  process.  As  (3.25)  reveals  we  must  also  have 

£pn(Xfb)<£free(Xfb)  ,  (5.17) 
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which  suggests  that  joint  use  of  feedback  and  pseudo-noise  is  advantageous  over  using 
feedback  alone,  if  and  only  if  a^pt(cr' )  in  (3.22)  equals  zero,  i.e.,  if  and  only  if  Xfb  < 
l/o-°Pt(l). 

5.3.4  Other  Encoding  Strategies 

We  can  similarly  develop  encoding  strategies  that  are  based  on  the  use  of  distinct  known  con¬ 
trol  inputs  across  the  network,  and  their  combinations  with  pseudo- noise  and/or  feedback- 
based  control  inputs.  For  instance,  we  can  design  encodings  using  control  inputs  exploiting 
pseudo-noise,  feedback,  and  known  components  which  can  provide  improvements  over  con¬ 
trol  inputs  based  on  joint  use  of  pseudo-noise  and  feedback  for  large  Xfb-  Specifically,  by 
exploiting  feedback  we  can  effectively  cancel  out  the  term  qT£?x[n  —  1]  in  (5.12).  By  as¬ 
sociating  with  the  £th  sensor  a  known  predetermined  quantizer  bias  u>kn[n;^]j  and  by  using 
pseudo-noise  inputs  with  smaller  (optimized)  <jw,  we  can  obtain  performance  improvements 
and  make  the  average  information  loss  to  effectively  grow  as  slow  as  linearly  with  Xfb- 


5.4  Signal  Estimation 


To  illustrate  some  of  the  techniques  that  can  be  exploited  to  design  effective  estimators 
of  time-varying  signals  based  on  encodings  obtained  via  quantizer  bias  control,  it  is  first 
convenient  to  consider  estimation  based  on  the  original  unconstrained  observations  s^[n] 
in  (5.5).  Let’s  consider  estimation  of  A[k\  based  on  observation  of  {s[m]}m<n,  and,  in 
particular,  let’s  focus  on  the  case  k  =  n.  Due  to  the  statistical  independence  of  the  IID 
noise  components  in  v[n]  and  the  form  of  (5.5),  the  sequence 


(5.18) 


forms  a  sequence  of  sufficient  statistics  for  estimating  A[n]  based  on  s[ra]  [1].  Moreover,  s[n] 
is  the  ML  estimate  of  4[n]  based  on  s[nj.  Equivalently,  we  may  replace  the  measurement 
equations  (5.5)  by  the  single  measurement  equation 


s[n]  =  A[n]  +  6[n] , 


(5.19) 
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where  t)[n]  is  a  zero-mean  IID  Gaussian  process  with  variance  cr^/L;  since  the  sequence 
s[n]  in  (5.18)  is  a  sequence  of  sufficient  statistics  for  estimating  A[n]  from  s[n],  optimized 
estimators  for  the  model  (5.1),  (5.2)  are  the  same  as  for  the  model  (5.1),  (5.19).  We  may 
exploit  this  observation  to  design  signal  estimators  of  A[n]  based  on  the  encodings  y[n]; 
specifically,  we  can  replace  the  L  measurement  equations  (5.4)  by  a  single  measurement 
equation  arising  from  making  an  estimate  of  the  nth  sample  A[n ]  based  on  the  nth  Lx  1 
observation  vector  y[n]. 

In  order  to  reduce  the  L  sensor  measurement  equations  (5.4)  into  a  single  equation 
we  consider  the  use  of  the  MAP  estimate  of  A[n]  based  on  observation  of  y[nj.  In  each 
particular  encoding  case  our  objective  is  to  obtain  a  single  “measurement”  equation  relating 
the  MAP  estimate  of  the  sample  A[n]  based  on  the  L  x  1  vector  sample  y[n],  and  the  signal 
A[n]  Af(0,  <?)}),  and  use  that  to  design  an  appropriate  Kalman  Filter  for  signal  estimation. 

5.4.1  Pseudo-noise  Control  Inputs 

For  pseudo-noise  control  inputs  the  MAP  estimator  of  A[n]  given  y[n]  is  given  via  the  EM 
algorithm  (3.27)  by  replacing  yN  with  y[n]  and  N  with  L,  where  a\  is  the  steady-state 
variance  of  A[n]  (and  where  a  few  additional  minor  modifications  are  required).  Specifically, 
the  resulting  algorithm  takes  the  following  form: 

-(k)  -  LQ  (*(fe)M)]  exp(-  (*(fc)[n])2  /2) 

AEM[n]+  V2 nLQ{zW[n])  [l  -  Q  (*<*)[«])]  ’ 

(5.20a) 

where 

z(fc)[n]  =  — EM—  ,  (5.20b) 

(JQ 

oQ  =  v/°'2  +  awi  and  where  AMapN  is  given  by 

AmapW  =  lim  Ag^j [n]  .  (5.20c) 

k-+oo 

For  large  enough  L  the  MSE  loss  of  this  algorithm  in  terms  of  estimating  A[n]  based  on 
y[n]  instead  of  s[n]  effectively  achieves  the  encoding  information  loss  (5.9).  Given  that  for 


^EM 


[n]  = 


1  4- 
_r  La2, 
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any  given  value  of  A[n],  the  MAP  estimate  becomes  asymptotically  Gaussian  with  mean 
A[n]  and  variance  £{A[n})  a^/L,  we  may  view  AmapM  as 

y[n]  =  AmapN  =  A[n]  +  vy[n\  (5.21) 

where  the  sequence  vy[n\  is  Gaussian  with  mean  zero  and  variance 

°ly  M  =  £{A{n])  °ll L  .  (5.22) 

Note  that  the  “equivalent”  sensor  noise  variance  <r?  [n]  at  time  n  is  a  function  of  the  signal 
value  A[n ]  at  time  n.  Assuming  that  the  pseudo-noise  power  level  has  been  optimally 
selected,  we  can  approximate  the  variance  of  as 

°ly  =  £  (x)  all L  ,  (5.23) 

where  £  (x)  is  the  average  loss  over  all  possible  values  of  A[n]  for  x  =  oa/ov 

We  can  then  design  a  Kalman  filter2  for  the  model  (5.1),  (5.21)  and  (5.23),  namely  [l], 

x[rc|ra]  =  Gx[n  —  l\n  -  1]  +  /*[«]  (y[n]  -  qrGx[n  —  l\n  —  1]) 

A*[n]  =  A,[n|n-  1]  q  (qT  Ar[n|n  -  1]  q  + /)_1 
Aar[ra|n]  =  (I  -  n[n]  qT)  A^fnlra  -  1] 

Ax[n\n  —  1]  =  G  A.x[n  -  l|n  —  1]  GT  +  a2  h  hr 

initialized  with  x[— 1|  —  1]  =  0  and  Ax[— 1|  —  1]  =  cr2 1. 

It  is  worthwhile  to  note  that,  in  general,  qr  Ax[n|fc]qr  provides  only  an  estimate  of 
Aa[w|^],  the  MSE  of  the  estimate  of  A[n]  given  all  observations  up  to  and  including  time 
k,  since  the  model  used  to  construct  the  Kalman  Filter  (5.24)  is  only  an  approximate  one; 
c/.,  Eqns.  (5.22)  and  (5.23). 

2In  fact,  we  can  also  design  an  Extended  K ailman  Filter  for  the  original  nonlinear  state-space  model  given 
by  (5.1),  (5.21)  and  (5.22). 


(5.24a) 

(5.24b) 

(5.24c) 

(5.24d) 


124 


5.4.2  Estimation  via  Feedback 


We  can  use  a  similar  approach  to  design  control  input  strategies  based  on  feedback  and 
associated  signal  estimators.  Specifically,  we  can  use  the  MAP  estimate  of  A[n ]  based  on 
y[n]  for  any  control  input  u?fb[n].  To  cancel  out  the  term  qr  G x[n  —  1],  we  can  select  this 
feedback  term  as 


tuft>[rc]  =  — q rGx[n  —  l|ra  —  1]  ,  (5.25) 

where  x[n  —  l|n  —  1]  is  our  estimate  of  x[n  —  1]  based  on  all  past  observations,  i.e.,  all 
available  observations  up  to  and  including  time  n  —  1.  Similarly  to  (5.21)  we  may  view  as 
our  measurement  equation 


y[n]  =  AmapM  =  A[n]  +  v  „[n],  (5.26) 

where  AmapM  is  given  by  (5.20a)  and  (5.20c)  with 

2(%n]  =  iaM±«Mi  (5.27) 

and  <ra  =  ov.  Again  we  approximate  the  zero-mean  strictly  white  nonstationary  noise 
source  vy[n\  with  a  zero-mean  IID  process  of  power  level  given  by 

4,  =  (Xlb)  4/L  ,  (5.28) 

We  can  then  design  a  Kalman  filter  for  the  system  model  (5.1),  (5.26)  and  (5.28);  it  is  given 
by  (5.24)  where  y[n]  and  a^y  are  instead  given  by  (5.26)  and  (5.28),  respectively. 

5.4.3  Estimation  in  Presence  of  Feedback  and  Pseudo-noise 

We  can  easily  extend  the  estimators  of  the  previous  sections  to  take  into  account  use  of 
both  pseudo-noise  and  feedback.  Specifically,  as  suggested  in  Section  5.3.3,  we  may  use 
feedback  of  the  form  (5.25).  We  may  use  as  our  measurement  equation  (5.26),  where 
AmapN  is  the  MAP  estimate  of  A[n]  based  on  observation  of  y[n]  and  is  given  by  (3.27) 
with  minor  modifications.  Specifically,  it  is  given  by  where  (5.20a),  (5.20c),  and  (5.27)  with 
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cra  =  y/a%  +  a%j.  In  that  case,  the  power  level  of  the  “noise”  process  uy[n]  is  given  by 


<r?y  =  C(A[n],  <ra)  al/L  ,  (5.29) 

which  in  the  case  that  the  pseudo-noise  power  level  is  optimally  selected  is  given  by 

<  =  £p"  (*fb)  °HL  ,  (5.30) 

for  large  xtb-  Especially  in  the  case  that  the  pseudo-noise  power  level  is  optimally  selected 
the  measurement  model  (5.26)  where  vy[n]  is  assumed  an  IID  process  of  variance  given  by 
(5.29)  is  a  reasonably  accurate  model  for  the  original  measurements  equations.  The  Kalman 
filtering  solution  for  this  approximate  model  is  given  by  (5.24),  where  Cyy  and  y[n]  are  given 
by  (5.29)  and  (5.26),  respectively. 

5.5  Encoding  and  Estimation  of  an  AR(1)  process 

As  a  brief  illustration  of  the  construction  of  the  encoding  strategies,  the  associated  estima¬ 
tors,  and  their  performance  characteristics,  we  next  consider  a  simple  example  involving 
estimation  of  a  first  order  AR  process  given  by 

A[n ]  =  pA[n  —  1]  +  y/l  —  p2(Tau[ji]  (5.31) 

where  u[n]  is  a  zero-mean  unit- variance  IID  Gaussian  process,  and  0  <  p  <  1.  As  is 
well  known,  for  the  parametric  model  (5.31),  the  parameter  p  can  be  viewed  as  a  rough 
measure  of  signal  bandwidth;  for  p  —  1,  A[n]  in  (5.31)  reduces  to  the  static  case  which 
we  have  considered  in  detail  in  earlier  chapters;  for  p  =  0,  A[n\  in  (5.31)  is  a  zero-mean 
IID  Gaussian  process  with  power  level  a\.  Fig.  5-1  shows  a  typical  sample  path  for  an 
intermediate  value  of  p. 

Let’s  consider  a  scenario  involving  a  distributed  network  of  L  sensors  measuring  A[n]  in 
statistically  independent  IID  sensor  noises  as  in  (5.2)  and  employing  binary  quantizer  bias 
control.  As  suggested  in  Section  5.3.3,  joint  use  of  feedback  and  pseudo-noise  is  in  general 
superior  over  using  feedback  alone.  This  is  clearly  illustrated  in  Fig.  5-2,  where  we  consider 
encodings  of  the  form  (5.15)  for  various  aw  levels,  for  a  network  of  L  =  103  sensors.  As  the 
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Figure  5-1:  Sample  path  of  an  AR(1)  process  with  dynamics  given  by  (5.31),  where 
yi  -  P2  =  0.2,  CM  =  1. 

figure  reveals,  there  is  an  optimal  power  level  in  terms  of  minimizing  the  associated  MSE 
loss.  The  optimal  power  level  is  in  fact  very  accurately  predicted  by  (3.22)  for  <m  replaced 
by  a'u  from  (5.14). 

Fig.  5-3  depicts  the  performance  of  this  encoding  strategy  as  a  function  of  the  “band¬ 
width”  parameter  p.  As  the  figure  reveals,  in  the  static  case  (yl  —  p*  =  0)  feedback  alone 
provides  the  optimal  encoding  loss  («  2  dB).  At  the  other  extreme,  i.e.,  y/l  —  p2  =  0, 
feedback  does  not  provide  any  encoding  benefits;  each  signal  sample  A[n\  is  independent 
of  all  past  and  future  signal  samples  so  we  can  not  rely  of  past  encodings  to  effectively 
predict  any  future  A[n]  samples.  On  the  other  hand,  suitable  use  of  pseudo-noise  across 
the  network  can  provide  performance  benefits.  And  for  intermediate  p  values,  joint  use 
of  feedback  and  pseudo-noise  provides  performance  improvements  over  using  feedback  or 
pseudo-noise  alone. 
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Figure  5-2:  MSE  loss  in  estimating  an  AR(1)  process  with  dynamics  given  by  (5.31),  where 
y/l  —  p2  =  0.2,  <7^4  =  1,  based  on  a  network  of  sensors  using  quantizer  bias  control  according 
to  (5.15),  and  where  crv  =  0.1. 


Figure  5-3:  MSE  loss  in  estimating  an  AR(1)  process  with  dynamics  given  by  (5.31),  as 
a  function  of  y/l  —  p2  =  0.2  for  a  a  —  1,  based  on  a  network  of  sensors  using  quantizer 
bias  control,  for  pseudo-noise  (dashed),  feedback-based  (dash-dot),  and  jointly  optimized 
pseudo-noise  and  feedback-based  control  inputs  (solid).  Sensor  noise  level:  av  =  0.1. 
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Chapter  6 


Contributions  and  Future 
Directions 


In  this  thesis  we  have  focused  on  signal  estimation  from  noisy  measurements,  where  system 
constraints  force  us  to  rely  on  a  quantized  description  of  the  noisy  measurements.  We  have 
developed  a  framework  for  designing  encodings  of  the  noisy  measurements  into  efficient 
digitized  descriptions  and  optimized  signal  estimators  from  these  encodings  for  a  number 
of  important  scenarios  with  various  encoder  complexity  characteristics. 

As  a  main  contribution  of  this  thesis,  we  have  introduced  encodings  of  the  form  of 
what  we  refer  to  as  quantizer  bias  control.  For  the  static  signal  case,  we  have  developed 
optimized  encodings  for  a  variety  of  important  scenarios  that  may  arise  in  practice,  together 
with  associated  estimators  which  are  asymptotically  achieve  the  optimal  performance  from 
these  encodings.  Specifically,  we  have  developed  a  framework  for  evaluating  these  quantizer- 
based  systems  by  means  of  a  figure  of  merit  which  we  refer  to  as  the  information  loss;  it 
is  defined  as  the  increase  in  dB  that  is  incurred  in  the  Cramer-Rao  bound  for  unbiased 
estimates  by  a  particular  type  of  additive  control  input  and  a  given  M-level  quantizer.  In 
general,  for  control-free  systems  the  performance  rapidly  degrades  with  peak  signal-to-noise 
ratio  (SNR)  Xi  which  is  defined  as  the  ratio  of  the  parameter  dynamic  range  to  the  sensor 
noise  power  level.  In  particular,  as  we  have  shown,  for  a  wide  class  of  IID  sensor  noises  the 
worst-case  information  loss  grows  faster  than  x2  if  no  control  input  is  used. 

We  have  considered  a  number  of  important  scenarios  that  may  arise  in  practice  which 
differ  in  terms  of  the  available  knowledge  about  the  control  waveform  for  estimation  and 
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the  associated  freedom  in  the  control  input  selection.  If  only  the  statistical  characteriza¬ 
tion  of  the  control  input  can  be  exploited  for  estimation,  we  have  shown  that  pseudo-noise 
control  inputs  can  provide  significant  performance  benefits,  in  the  sense  that  the  worst-case 
information  loss  can  be  made  to  grow  as  slow  as  quadratically  with  SNR.  If  knowledge  of 
the  particular  control  input  is  exploited  for  estimation,  even  higher  performance  can  be 
achieved.  In  particular,  we  have  developed  methods  for  selecting  the  control  input  from 
a  suitably  designed  class  of  periodic  waveforms,  for  which  the  worst-case  information  loss 
grows  linearly  with  SNR.  Finally,  for  cases  where  feedback  is  available  we  have  developed 
control  waveform  selection  strategies  and  corresponding  computationally  efficient  estima¬ 
tors  that  asymptotically  achieve  the  best  possible  performance  for  quantizer-based  systems 
with  additive  control  inputs.  Specifically,  these  estimators  achieve  the  minimum  possible 
information  loss  for  the  associated  quantizer-based  system  which  is  independent  of  SNR. 
It  is  worth  emphasizing  that  these  performance  characteristics  are  exhibited  by  any  M- 
level  quantizer  and  a  wide  class  of  IID  sensor  noises.  Furthermore,  our  methodology  easily 
generalizes  to  scenarios  involving  networks  of  sensors  employing  quantizer  bias  control. 

For  all  encoder  complexity  scenarios  we  considered,  we  have  shown  that  optimized  en¬ 
codings  have  the  same  asymptotic  characteristics  even  when  the  figure  of  merit  is  average 
(rather  than  worst-case)  performance,  i.e.,  when  there  is  prior  information  regarding  the 
relative  likelihood  of  the  signal  values.  Furthermore,  these  asymptotic  performance  rates 
remain  unaffected  even  if  the  sensor  noise  power  level  in  the  original  measurements  is  un¬ 
known. 

Although  quantizer  bias  control  encoders  exploiting  feedback  can  be  constructed  whose 
performance  does  not  degrade  with  SNR,  in  general  these  systems  incur  a  small  information 
loss.  This  loss  in  performance  is  an  inherent  limitation  of  all  encoders  employing  quantizer 
bias  control  and  can  only  be  eliminated  by  allowing  more  freedom  in  the  encoder  design.  For 
cases  where  such  freedom  is  available,  we  have  developed  a  framework  for  designing  efficient 
refinable  encoding  descriptions  and  estimators  from  these  descriptions  which  asymptotically 
achieve  the  performance  of  any  estimator  that  could  be  computed  at  the  encoder  from  the 
original  noisy  measurements.  In  the  event  that  the  estimate  computed  at  the  encoder  is 
asymptotically  efficient  with  respect  to  the  original  sensor  measurements,  these  encoder 
and  estimator  pairs  have  the  attractive  property  that  they  achieve  asymptotically  optimal 
performance,  i.e.,  the  resulting  estimate  based  on  the  encodings  asymptotically  achieves 


the  best  possible  performance  based  on  the  original  sensor  measurements. 

A  very  important  extension  of  the  encoding  and  estimation  strategies  involves  develop¬ 
ing  efficient  encodings  of  noisy  measurements  of  time-varying  information-bearing  signals 
obtained  at  multiple  sensors.  Although  the  framework  we  have  developed  for  the  static  case 
is  in  general  inadequate  for  efficient  encoding  in  the  time-varying  case,  we  have  shown  that 
we  can  exploit  the  key  encoding  principles  used  in  the  static  analysis  to  develop  a  rich  class 
of  encoding  strategies  for  time-varying  signals.  In  particular,  pseudo-noise,  deterministic, 
and  feedback-based  control  inputs  can  be  effectively  combined  to  provide  improved  perfor¬ 
mance  over  encodings  strategies  relying  on  only  one  of  these  types  of  control  inputs.  We 
have  shown  that  in  all  cases  performance  is  intricately  linked  to  an  appropriate  measure  of 
signal-to-noise  ratio  which  depends  on  the  particular  signal  characteristics  and  the  allowed 
freedom  in  the  encoder  design.  In  the  same  context,  we  have  developed  estimators  that 
make  use  of  static  case  estimation  principles  to  transform  the  multi-sensor  measurements 
into  an  equivalent  sufficient  “single  measurement”  characterization  which  enables  the  use 
of  a  Kalman  filter  based  approach  to  estimation. 

Although  we  have  sketched  a  number  of  optimized  strategies  that  can  be  used  to  encode 
and  estimate  noisy  time-varying  signals,  there  are  a  number  of  important  issues  that  must 
be  successfully  addressed  to  make  such  schemes  practical  in  the  context  of  distributed  sensor 
networks.  Typical  issues  that  may  arise  in  practical  wireless  sensor  networks  include  inher¬ 
ent  delays  in  all  encoding  strategies  that  exploit  feedback,  as  well  as  the  signal  variability 
that  is  often  exhibited  across  a  network  of  this  form. 


6.1  Future  Directions 

While  a  large  class  of  problems  have  been  addressed  in  this  thesis,  there  are  a  number  of  very 
important  extensions  that  warrant  further  investigation.  Indeed,  some  of  these  problems 
have  been  identified  within  the  appropriate  thesis  chapters.  However,  there  are  a  number 
of  other  important  future  directions  that  are  immediately  suggested  by  this  work  as  well 
as  potential  connections  with  other  important  problems  arising  in  various  other  areas  of 
research. 

In  the  context  of  parameter  estimation  based  on  encodings  via  quantizer  bias  control, 
for  instance,  it  is  important  to  study  the  performance  that  is  achievable  based  on  finite- 
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length  observation  windows.  Such  analysis  may  be  beneficial  for  a  number  of  applications 
involving  signal  quantizers.  In  addition,  in  most  of  our  analysis  we  have  assumed  that  the 
sensor  noises  are  IID  processes.  However,  in  many  applications  sensor  noise  samples  are 
temporally  or  even  spatially  correlated. 

An  interesting  future  direction  pertains  to  extending  the  optimized  higher-complexity 
encoding  schemes  we  have  developed  in  Chapter  4  in  the  context  of  time-varying  signals.  An 
intriguing  question  pertains  to  determining  the  best  possible  performance  that  is  achievable 
by  any  such  system,  as  well  as  the  magnitude  of  the  performance  losses  introduced  by 
constraining  the  encoding  strategy  to  the  simpler  quantizer  bias  control  methods  we  have 
developed. 

The  framework  we  have  introduced  may  potentially  provide  insight  in  many  other  re¬ 
search  areas.  Indeed,  this  is  one  of  the  potentially  most  fascinating  directions  for  future 
work.  For  instance,  the  framework  we  have  developed  appears  naturally  suited  for  evalua¬ 
tion  A/D  conversion  of  noisy  analog  signals.  In  this  case,  the  A/D  converter  has  the  dual 
function  of  removing  noise  from  the  noisy  analog  signal  and  of  constructing  an  accurate 
digitized  estimate  of  the  analog  signal.  Indeed,  some  of  the  systems  we  have  developed  in 
this  thesis  may  be  useful  in  designing  high  bit-rate  A/D  converter  arrays.  However,  the 
practical  constraints  that  dictate  the  design  of  these  systems  may  differ,  in  general,  from 
the  ones  that  we  have  considered  in  depth  in  this  thesis  [20,  21,  29]. 

Dithered  quantizers  find  use  in  a  number  of  other  applications  such  as  reconstruction 
of  bandlimited  signals  via  coarse  oversampling  [36],  and  halftoning  techniques  for  images 
[25,  27].  The  objective  in  halftoning  is  to  add  pseudorandom  patterns  to  an  image  signal 
before  coarse  quantization  as  a  method  of  removing  visual  artifacts  that  occur  from  coarse 
quantization  of  image  areas  that  exhibit  small  signal  variation.  A  number  of  halftoning 
techniques,  e.g.,  [27],  can  be  viewed  as  parallels  of  pseudo-noise  quantizer- bias  control  en¬ 
codings  of  the  original  image  into  coarsely  quantized  pixels  where  there  is  the  additional 
constraint  that  the  “estimator”  to  be  used  is  our  visual  system.  Further  connections  be¬ 
tween  halftoning  techniques  and  the  systems  we  have  developed  in  this  thesis  in  the  context 
of  constrained  signal  estimation  have  yet  to  be  explored. 

Perhaps,  the  most  exciting  and  fruitful  future  directions  of  this  thesis  pertain  to  finding 
connections  and  forming  ties  between  this  work  and  other  important  problems  that  arise  in 
other  disciplines  in  science  and  engineering. 
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Appendix  A 


A.l  Worst-Case  Information  Loss  for  Control-free  Signal  Quan¬ 
tizers 


In  this  appendix  we  show  that  the  worst-case  information  loss  of  any  signal  quantizer  grows 
faster  than  x2  for  large  x  in  the  absence  of  a  control  input.  We  first  consider  the  case  M  —  2 
and  show  by  contradiction  that  £^x(x)  7^  0  (x2)  as  x  — i ►  00,  i.e.,  we  show  that 


lim 

X-+00 


=  0 


(A.1) 


cannot  be  true.  Letting  x  — >  00  is  equivalent  to  fixing  A  and  letting  ov  — ^  O'*",  since  the 
control-free  information  loss  for  M  =  2  is  completely  characterized  by  x-  Let  Bmax  (A,  ov\  y) 
denote  the  worst-case  Cramer-Rao  bound  for  estimating  A  from  one  sample  of  the  IID 
sequence  y[n],  for  |A|  <  A,  and  noise  level  ov.  Then,  (A.l)  implies  that 


lim  Bmax  (A,  aw;  y)  =  0 ,  (A.2) 

<T„->0+ 

where  we  used  (2.4),  (2.7)  and  (2.8).  However,  (A.2)  suggests  that,  as  ov  -*  0+,  we  can 
estimate  any  A  in  (-A,  A)  with  infinite  accuracy  from  an  one-bit  observation  y[n],  which 
is  not  possible.  Thus,  (A.l)  is  false,  i.e.,  £^ax(x)  has  to  grow  at  least  as  fast  as  x2- 

Similarly,  we  can  also  show  that  £|^(x)  grows  faster  that  x2>  in  the  sense  that 
£max(x)  #  0(x2).  We  show  this  by  first  assuming  that  £^ax(x)  =  0(x2),  Le.,  that  we 
can  find  D  <  00  and  Xo,  such  that  for  X  >  Xo  we  have  -C^Sc(x)  <  Dx2,  and  arriving  to 
a  contradiction.  The  condition  £max(x)  =  0(x2)  is  equivalent  to  the  statement  that  there 
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exists  D  <  o o  such  that 


«#■ 


r free  ("Y\ 

limsup  =  D  .  (A.3) 

X-^oo  X 

Again  using  (2.4)  and  (2.7)-(2.8)  in  (A.3)  we  obtain  the  following  equivalent  statement, 

lim  sup  Smax  (A,  av ;  y)  =  D' ,  (A.4) 

(Tv-*  0+ 

where  D'  <  oo.  Since  the  sequence  y[n ]  is  IID,  (A.4)  implies  that  as  av  — >  0+,  the  Cramer- 
Rao  bound  B  (A;  yN)  is  upper-bounded  by  D'/N ,  which  goes  to  0  as  AT  — >■  oo.  However,  for 
any  A  ^  0,  in  the  limit  av  — )■  0+  we  have  y[n]  =  sgn  (A)  with  probability  1  for  all  n,  which 
in  turn  implies  that  B  (A;  yN)  cannot  go  to  0  as  N  -*■  oo.  Thus,  we  must  have  D'  —  oo  in 
(A.4),  which  proves  that  the  control-free  worst-case  information  loss  is  not  0(x2). 

We  can  show  that  £max(x)  7^  O  (x2)  for  signal  quantizers  with  M  >  2,  by  using  our 
results  for  M  =  2.  Specifically,  if  A  is  fixed,  in  which  case  x  -7  oo  is  equivalent  to  av  — >•  0+, 
the  arguments  used  for  the  M  =  2  case  still  apply  with  minor  modifications.  Next  consider 
fixing  av,  in  which  case  x  ~ ^ ►  oo  is  equivalent  to  A  — >  oo.  As  usual,  let  X\,  X2,  Xm-i 
denote  the  quantizer  thresholds.  By  rescaling  by  1/A,  this  problem  can  be  mapped  to  an 
equivalent  one  where  A'  =  1,  o'v  =  crv/A  — >  0+,  and  where  the  new  quantizer  thresholds 
are  Xi/A,  X2/A,  Xa/_i/A.  The  arguments  used  to  show  that  £jSSc(x)  7^  O  (x2)  in  the 
M  =  2  case  still  apply  in  this  case  with  minor  modifications. 


A. 2  Worst-Case  Information  Loss  for  Known  Control  Inputs 

We  first  show  that  for  any  known  control  input  scenario,  the  worst-case  information  loss 
grows  at  least  as  fast  as  X-  This  is  true  for  any  sensor  noise  distribution  and  for  any  M  >2. 
For  convenience,  we  denote  by  pw  (•)  the  empirical  probability  density  function  of  the  known 
sequence  [26].  The  associated  Cramer-Rao  bound  for  estimating  A  based  on  y[n]  for  a 
particular  pw  (■)  is  given  by 

B(A;  yN,  pw  (•))  =  i  (e  [{B  (A  +  w;  y)}"1])'1  (A.5) 
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where  the  expectation  is  with  respect  to  pw  (•).  For  instance,  if  the  periodic  sequence  (2.26) 
is  represented  by  an  empirical  PDF  consisting  of  K  Kronecker  delta  functions  located  at 
u;[ra]  for  n  —  0, 1  * •  ■  ,K  —  1  and  each  with  area  1/it ,  then  (A. 5)  and  (2.24)  are  equivalent. 

For  convenience,  we  consider  the  inverse  of  the  Cramer- Rao  bound  in  (A.5),  namely,  the 
Fisher  information  of  A  given  y[n].  We  denote  the  Fisher  information  in  the  control-free 
case  by  T  (A;  y).  The  worst-case  Fisher  information  (A;  pw  (•))  for  an  input  with  an 
empirical  PDF  pw  (-)  is  defined  as 


(A;  Pw  (•))  =  inf  E  \T  (A  +  w;  y)] 

|,4|<A 

where  the  expectation  is  with  respect  to  pw  (•).  Consider  the  optimal  selection  of  pw  (•), 
which  results  in  maximizing  Tm\n(A;  pw  (-)),  i.e., 

fopt  (A)  =  max  E min  (A;  pw  (•))  - 

Pw(-) 

The  growth  of  the  optimal  worst-case  information  loss  equals  the  growth  of  the  inverse  of 
the  optimal  worst-case  Fisher  information  defined  above. 

We  will  make  use  of  the  fact  that  the  control-free  worst-case  information  loss  grows 
strictly  faster  than  xp  for  p  <  2  (c/.  the  generalization  of  (A.l)  to  quantizers  with  M  >  2). 
Without  loss  of  generality  we  may  set  av  =  1,  in  which  case  A  =  x-  Since  B  (A;  s)  is 
independent  of  A  (and  thus  A),  the  control-free  worst-case  Fisher  information  of  A  based 
on  y[n]  decays  faster  than  1/AP  for  any  p  <  2,  as  A  increases.  Thus,  there  exist  D  >  0  and 
<5  >  0,  such  that  for  any  |A|  >  S 


T 


(A;  y)  =  [B  (A;  y)]"1  <  min  {  D  |A|-P,  [*(A;  s)]"1}  ,  (A.6) 


for  any  given  p  <  2.  For  convenience,  we  pick  p  so  that  1  <  p  <  2.  Also,  let 


Vk 


[A;  pw  (•))  =  f 

Jkl 


pw  (w)  dw 


/ k  5<|u/+.A|<(A:-hl)  6 

For  any  empirical  PDF  pw  (•)  and  any  A  satisfying  A  >  6,  we  must  have 


lir<fip*('4;p»(-))<x 


(A.7) 
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We  can  establish  (A.7)  via  proof  by  contradiction;  if  the  inequality  in  (A.7)  is  reversed,  for 
any  A  in  (—A,  A)  we  have 


■ Pk  (A;  pw  (•))  > 


26 
A  ‘ 


(A.8) 


Let  Aj  =  j  S  for  j  =  0,  ±1,  •  •  •  ,  ±j0,  where  j0  is  the  largest  index  j  satisfying  Aj  <  A. 
Note  that  j0  >  A/(2  6).  Applying  (A.8)  for  A  =  Aj,  and  summing  over  all  j  yields 


£  Pk  (Af,  pw  (•))  >  (2  j0  +  1)  ^  ,  (A.9) 

J=~jo 

which  is  a  contradiction  since  the  left  hand  size  of  (A.9)  is  upper-bounded  by  2  fw  pw  (w)  dw, 
while  (2  j0  +  1)  (2  5) /A  >  2.  We  can  similarly  derive  the  following  generalization  of  (A.7) 


inf 

|j4|<A 


E/w*  ft,  <•)><¥!>' 

k  k 


(A.10) 


where  j3k  >  0  and  at  least  one  of  the  /Vs  is  non-zero.  We  have 


^opt(A)  = 


< 

< 

< 


max  inf 
Pw{  )  |a|<a 


/ 

Jkt 5 


CO 

Ic=q  •/fc5<|«/+^4|<(A:+l)  S 


pw  (w)  T  {A  +  w,  y)  dw 


max  inf 
Pw(-)  |A|<a 


([B( 0;  s)]-1P0(A;pw(-))  + 


oo 


E 


D 

(, ksy 


^(A;pw(-Mlla) 


(A.llb) 
(A. 11c) 


where  C  <  oo,  since  k~p  is  a  convergent  series  for  p  >  1.  To  obtain  (A.lla)  and 

(A.llb)  we  used  (A.6)  and  (A.10),  respectively.  As  (A. 11c)  reveals,  for  large  A  the  optimal 
worst-case  information  loss  grows  at  least  as  fast  as  x  (since  x  —  A  for  av  =  1). 

We  next  show  that  simple  periodic  control  input  schemes  can  be  constructed  for  which 
the  worst-case  information  loss  (for  N  ->  oo)  grows  linearly  with  x-  It  suffices  to  consider 
signal  quantizers  with  M  —  2,  since  signal  quantizer  with  M  >  2  provide  additional  infor¬ 
mation  and  would  thus  perform  at  least  as  well.  In  particular,  we  next  show  that  K -periodic 
waveforms  given  by  (2.26),  where  K  is  given  by  (2.27)  for  a  fixed  A  >  0,  achieve  the  opti- 


mal  growth  rate  for  any  admissible  sensor  noise  and  a  symmetric  two-level  quantizer.  Let 
B  (A;  cv )  denote  the  Cramer-Rao  bound  (2.14)  with  a  replaced  by  v.  Note  that  since 

B(A;<rv)  =  o*B{A/<rv;l),  (A.12) 

we  also  have  £?max  (A;  av)  =  Bmax  (A  jov\  1),  which  in  conjunction  with  (2.8)  reveals  that 
the  associated  information  loss  is  completely  characterized  by  the  ratio  x  =  A  jov.  Since  K 
also  solely  depends  on  x>  we  may  fix  A  =  1  without  loss  of  generality.  Note  that  the  class 
(2.26)  remains  invariant  to  changes  in  crv.  Hence,  we  may  use  w[n;  K]  to  denote  the  unique 
/{’-periodic  sequence  from  the  class  (2.26)  corresponding  to  A  =  1.  For  ov  <  A,  we  have 
3  A/cr„  >  K,  and 


Bmax  ( 1 1  O’n  )  — 

K 

^€(-1,1)  ELi  [B  ( a  +  w[n\  K];  ct,,)]-1 

(A.13a) 

< 

—  sup  min  B  (A  +  w[n;  K]:  crv) 

<TV  A€(- 1,1)  n€{l,2,...,A'} 

(A.13b) 

< 

3  A  crv  sup  min  B  (A!  +  w'[n’,  K]]  l) 

A'€(  —  l/crv,l/crv)  n€{l,2,—  ,K} 

(A.13c) 

< 

3  A  av  sup  B  (A;  1)  , 

(A. 13d) 

4€(-l/A,l/A) 

where  tn'[n;  K\  =  w[n\  K]/ov,  and  where  we  used  (A.12)  to  obtain  (A.13c)  from  (A. 13b).  To 
verify  (A.13d)  from  (A.13c),  note  that  for  any  fixed  A1  in  (— 1  /crv,  l/<7„),  the  minimum  of 
B  ( A '  +  w'[n;  K];  1)  over  n  is  upper-bounded  by  B  {A'  +  w'[n'\  K];  1),  where  n!  is  the  value 
of  n  for  which  \A'+w'[n\  K]\  is  the  smallest.  Since  the  spacing  Swi  of  the  sawtooth  waveform 
w'[n;  K]  satisfies  8W<  —  8wfav  <  2/A,  \A'  -f  w'[n'-,K\\  is  upper-bounded  by  8W'/ 2  <  1/A  for 
any  \A’\  <  l/av,  verifying  (A. 13d).  Since  B  (A;  s)  ~  o*  from  (2.8)  and  by  using  (A.13d), 
the  worst-case  information  loss  for  known  w[n]  given  by  (2.26)  with  K  given  by  (2.27)  is 
inversely  proportional  to  av  for  small  crv.  Hence,  this  control  selection  method  achieves  the 
optimal  worst-case  information  loss  growth  rate. 

We  next  determine  the  optimal  A  in  (2.27)  for  the  case  where  w[n]  is  Gaussian  with 
variance  <r^.  We  use  Bj^f  (x;  x>  K)  to  denote  the  Cramer-Rao  bound  (2.24)  for  A  =  xA  in 
order  to  make  its  dependence  on  x  and  on  the  period  K  in  (2.26)  explicit.  The  optimality  of 
(2.27)  suggests  that  Kop t  from  (2.25)  is  a  non-decreasing  function  of  x  for  x  large.  Indeed, 
there  is  a  sequence  Xfc  where  k  >  3,  such  that,  R'0pt(x)  =  k,  if  Xfc  <  X  <  Xfc+ 1-  If  X  =  Xfc , 
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both  K  =  k  and  K  =  k  +  1  minimize  (2.25),  i.e., 


sup  Bs/(x;  xk,  k)  =  sup  Bm  (x;  Xfc,  k  +  1)  .  (A.14) 

*e(-i,i)  *e(-u) 

For  large  k  the  left  hand  side  of  (A.14)  is  maximized  at  x  =  1  (i.e.,  A  =  A  in  (2.24)),  while 
the  right  hand  side  is  maximized  at  x  =  1  —  d(xk\ k) /2  with  d(-;  •)  given  by  (2.28).  Assuming 
that  dopt(x)  in  (2.29)  converges  for  large  x  to  a  limit,  i.e.,  that  d^  =  limx_+co  dopt(x)  exists, 
(A.14)  reduces  to 

oo  oo 

£  [B*((n+ 1/2)  d^-,  l)]"1  =  J2  [BN(.nd0 Q;  l)]"1  ,  (A.15) 

n=—\  n=0 

where  B#  (A;  a )  denotes  B  (A;  a)  for  v[n]  Gaussian,  and  is  given  by  (2.16)  for  cra  =  a. 
Both  infinite  series  in  (A.15)  are  convergent;  in  fact,  only  a  few  terms  of  each  series  are 
required  to  obtain  an  accurate  estimate  of  d <*,  such  as  the  one  given  in  (2.31).  Using  doc 
from  (2.31)  in  conjunction  with  (A.14)  and  (2.24)  yields  (2.32).  Similar  results  hold  for 
nonGaussian  sensor  noise  PDFs.  Specifically,  a  relation  of  the  form  (A.15)  holds  for  d^ 
defined  in  (2.30),  where  Bjf  (• ;  •)  is  replaced  by  the  associated  B  (• ;  •).  The  resulting  infinite 
series  in  (A.15)  are  both  convergent  since  their  terms  decay  faster  than  l/np  for  1  <  p  <  2 
(recall  that  B  (A;  y )  grows  faster  than  Ap  for  the  control-free  scenario).  Clearly,  the  value 
of  doc  depends  on  the  particular  noise  PDF. 

Extensions  of  the  preceding  control  selection  strategies  can  be  developed,  which  achieve 
the  optimal  growth  rate  of  the  worst-case  information  loss  for  finite  N .  Let  vrN  denote  the 
control  vector  associated  with  the  finite- A"  strategy,  which  is  assumed  known  for  estimation. 
Given  a  set  of  w[n]  and  K  selected  according  to  infinite- A'  scheme,  a  finite-iV  method  that 
achieves  the  same  information  loss  for  any  A  selects  \vN  randomly  from  a  set  of  K  equally- 
likely  vectors  W(N,K)  =  {wf  :  1  <  i  <  K},  where  the  nth  element  of  the  IV  x  1  vector 
wf  is  given  by  tn,[n]  =  w[i  N  +  n] . 

A.3  Information  Loss  for  Signal  Quantizers  with  M  -*  oo 

We  consider  a  uniform  quantizer  with  M  =  2 (K  +  1)  levels.  Given  K,  we  select  the 
quantizer  thresholds  a slt  =  kf\fKx,  where  k  =  —K,  ■  ■  ■  ,  K,  and  x  >  0.  For  convenience, 
we  let  X-k-i  =  — oo  and  Xk+ i  =  oo.  We  next  examine  the  Cramer-Rao  bound  (2.12)  for 


ic[n]  =  0,  where  v[n]  is  admissible.  We  may  rewrite  (2.12)  as 


*04;  y*)  =  i? 


N 


'  A'+l 

51 

.k=-K  . 


-1 


(A.  16a) 


where 


\pv{Xk-A)-Pv{Xk^-A)f 
Kk  Cv  (Xk-i  -  A)  -Cv  (Xk  -  A)  ’ 


(A. 16b) 


Note  that  as  K  — t  oo,  both  — ►  0  and  £a+i  -t  0.  By  letting  m*  =  (Xk  +  Xk- 1)/2  —  A, 

for  large  K  and  for  k  =  -K  +  1,  •  •  • ,  K  we  have 


Pv  (Xk  —  A)  —  pv  (Xk- 1  -  A)  as  p'v  (mk)  x/VK  , 
Cv(Xk-i-A)-Cv(Xk-A)  as  frimdx/VK, 


which  imply  that 


^MNf 

~  K  (™fc) 


Vk’ 


(A. 17) 


Approximation  (A. 17)  becomes  an  equality  as  K  —»■  oo.  Letting  K  — >  oo  in  (A.16)  and 
using  (A. 17)  yields 


lim  B  (A;  yN) 

K-*oo  V  / 


fc=-A'+l 


if  A'  ■ 

=  at  [^(f-K+a'+l)+^T«,  E  6 

- 1  \r  -  * 

W  U=-oo  Pv(t-A) 

■  i  \L. * 


-1 


=  B(A;  sN) 


A.4  Asymptotic  Efficiency  of  ML  Estimator  for  the  Case 

M  =  2 

In  this  appendix  we  show  that  Aml  =  Aml  (y^;  A)  given  by  (2.40)-(2.42)  achieves  (2.14) 
for  N  large,  if  a[n]  is  admissible.  Let  k  denote  the  binomial  random  variable  k(yN)  = 
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Kx  ( yN )  /N.  Then, 


{-A  if  k  <  Ca  (A) 

g(k)  =  -C-1  (*)  if  Ca  (A)<k<  Ca  (-A) 
A  if  k  >  Ca  (-A) 

For  large  N  the  following  approximation  is  valid  in  the  cumulative  sense 


(A.18) 


k  ~  N{p,  aN) , 


(A.19) 


where  p  =  Ca  {—A)  and  =  y/p(l  -  p)/N.  Since  g(-)  is  invertible  (CQ  (•)  is  strictly 
monotone  almost  everywhere),  the  PDFs  of  Aml  and  k  are  related  as  follows  [26] 


S(A  -  A)  Q  (yN  1 3+)  if  A  =  A 

p~k  (pot  (-a))  Pa  (-a)  if  -  A  <  A  <  A  , 
6(A  +  A)Q(VNpy  if  A  =  —A 


(A.20) 


where 


/?+  = 


Ca(-A)~P 

Vp^-p) 


P-  = 


p-Ca(A) 


(A.21) 


Note  that  the  PDF  of  Aml  in  (A.20)  consists  of  a  sum  of  Kronecker  delta  functions. 

We  first  consider  P^Uh  (a^J  for  |A|  <  A.  If  Ar  is  large  enough,  so  that  (A.19)  is  valid 
and  also  -C  crQ,  the  following  approximations  for  P^ML  (a'J  are  valid  in  the  regime 
(—A,  A),  in  the  sense  that  for  any  A  in  (—A,  A)  the  values  of  the  corresponding  cumulative 
distribution  functions  are  approximately  equal  (and  where  the  approximation  generally 
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improves  as  N  increases) 


1  |  (cJ-A)-Ca(-A)) 

p :  (A)  *  -=1—  exp  1  V  ' 

'  y/2HuN 


where 


( 

\ 


2  °N 


Pa  (-i)  (A.22a) 


/ 


27rv^(p°(-^))  2 

/ 


exp 


k.(-i)  .  |  (a 22b> 


2  a 


N 


V2^(~r  I'/N) 

1 


exp 


(. Pa  M))S 


V 


- - r-  exp 

\/2"jr  [yy/y/NJ  \  2 


0M! 

(A-Af) 

(y/VN)2)  ’ 


(A.22c) 


(A.22d) 


72  =  Ca  (A)  Ca(- A)  (pft(-A)) 


-2 


Approximation  (A.22a)  results  from  using  (A. 19)  in  (A.20).  To  verify  (A.22b)  note  that 
in  the  region  that  exp(— [CQ(— A)  —  CQ  (— A)]2/[ 25^])  is  essentially  non-zero,  we  have 
pa  (j-Aj  «  pa  (—A).  For  ct/v  <C  cra,  the  following  approximation  is  valid  for  the  expo¬ 
nent  in  (A.22b) 


Ca(-i)-Ca(-A)]2«(pa(-A))2  (i -A)2  , 


which  when  substituted  in  (A.22b)  results  in  (A.22c),  and  (A.22d).  From  (A.22d),  for 
large  N  we  have  Aml  ~  Af(A,y2/N)  in  the  regime  (-A,  A).  Provided  N  is  large  enough, 
72/Ar  <C  A  —  |A|,  in  which  case  the  MSE  term  contributed  from  Aml  €  (—A,  A)  approaches 
the  Cramer-Rao  bound  y2/N.  Next,  consider  the  two  other  regimes,  where  Aml  =  ±A.  Let 
p -  =  exp(-/32 /2),  and  p+  =  exp(— /32 /2),  where  /?_  and  /3+  are  given  by  (A.21).  For  large 
enough  N ,  Q  (yffp+'j  «  cx  p+/y/N,  and  Q  (y/NfiSj  «  c2  p+/y/N.  Since  0  <  p+,p~  < 
1,  the  corresponding  MSE  terms  go  to  zero  much  faster  than  1/N  for  large  N,  so  their 
contribution  to  the  MSE  becomes  negligible  for  large  N  as  compared  to  7 2/N.  Hence,  Aml 
achieves  the  Cramer-Rao  bound  (2.12)  for  large  N. 
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A. 5  EM  Algorithm  for  Parameter  Estimation  in  Gaussian 
noise  via  Signal  Quantizers 

In  this  appendix  we  present  the  derivation  of  an  EM  algorithm  that  can  be  used  to  obtain 
the  ML  estimate  of  an  unknown  parameter  from  a  network  of  signal  quantizers.  The  ith 
observation  t/,-  is  given  by 


Vi  =  Fi(xi)  i  =  1,  2,  •  •  • ,  I , 

where 


X{  =  A  +  Vi  +  Wi  , 


(A.23) 


A  is  the  unknown  parameter  of  interest,  the  sequence  vt  is  IID  with  vt-  ~  A^ (0,  <rf),  W{  is 
the  selected  (known)  control  input,  and  Fj(-)  is  the  ith  quantizer  and  is  given  by  (2.2).  We 
use  Xj-(-)  and  Xz(-)  to  denote  the  functions  mapping  each  quantizer  level  Ym  of  the  ith 
quantizer  Ft(-)  to  the  associated  lower  and  upper  thresholds  Xm_i  and  Xm,  respectively. 

We  select  as  the  complete  set  of  data  the  set  X{  in  (A.23) .  For  convenience,  let 


Xi  x2 


and 


The  EM  algorithm  selects  the  estimate  of  A  at  the  k  +  1st  step,  based  on  Ag^  and 

y  according  to 


where 


aEM 


=  arg  max 


U  (0;  Ag^  —E  lnp(x;  ^)|y;  Ag^i  • 


(A.24) 


(A.25) 
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The  log-likelihood  function  In  p(x;  9)  satisfies 


I  i 

lnp(x;0)  =  C  -  ^2  T- 2  (*i  “  wi  ~ 
i= x  Z(Ti 

=  H*)  +  0Y1  ~2^Xi  ~  ~02J2^~2 

i=  l2a» 

If  we  substitute  the  expression  (A.26)  for  In  p(x;  8)  in  (A.25)  we  obtain 

u  =  E  [Mx)ly;  ^em  + e  -  y  a*  i 


where  p  =  J2i=i  a72->  and 


=  e4£.w  =  Ei^hiy^™]  ■ 

t=i  t=i  * 

Substituting  in  (A.24)  the  expression  for  U  (O',  Ag^  in  (A.27)  we  obtain 


(A.26) 


(A.27) 


(A.28) 


ig+X)  =  Xa  (E[k]/p) 


(A.29) 


Let  x{  =  2Li(yi)  and  xi  —  Xi(yi)-  Using 

P  (xi  |  y ;  =  p  (an  \  yt- ;  A^i)  =  P  (vi  I  *i  5  ^em)  P  (xi !  ^em)  [p  (p* !  ^em)] 


we  obtain 


A(fc) 

aEM 


+ 


,  (A.30) 
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which  when  substituted  in  (A.29)  results  in 


(A.31) 


Several  special  cases  of  (A.31)  are  of  interest.  In  particular,  if  iq(x)  =  F(x)  =  sgn(ar), 
(A.31)  reduces  to 


/ 


^EM  _ 


A(fc) 


+ 


Ei=  1  F 


-2 


V 


exp  - 


V'  Ki 


(4m 

TP 


\ 


(A.32) 


Next,  consider  the  special  case  where  N  observations  are  collected  from  a  single  M- level 
quantizer  ( i.e .,  Fi(x)  =  F(x)  and  /  =  N).  If,  in  addition,  in;  =  w  and  o-t-  =  cr  for  all  i, 
(A.31)  reduces  to 


A 


(*+i) 

EM 


/ 


=  2a 


M 

i<*) 

aem  +  2-j 


°  ^Ym  (y) 


exp 


VEk  N 


(A. 33) 


only  the  sufficient  statistics  Kyx  (y) ,  ICy2  (y) ,  •  •  •  ,  fCyM_x  (y),  are  used  in  (A.33)  to  obtain 
Aml- 

A  number  of  Generalized  EM  (GEM)  algorithms  can  be  derived  which  have  interesting 
connections  to  the  fast  algorithms  developed  for  parameter  estimation  in  the  presence  of 
feedback.  A  GEM  algorithm  results  in  a  sequence  of  estimates  Aq£.m  which  have  the 
property  that  at  every  step  they  increase  U  (&;  in  (A.25)  instead  of  maximizing  it, 

i.e., 


u  ( ^  —  TJ  ( •  4^)  ^  >  n 

u  (^GEM  >  ^GEM )  u  ^GEM’  ^GEM  J  >  u  • 


(A.34) 


Given  the  kth  iteration  estimate  of  this  algorithm  Aq£M,  the  associated  U  ^gem)  *s 
given  by  (A.25),  where  E[k]  is  given  by  (A.28)-(A.30)  with  Ag^  replaced  by  Aq£M,  which 
we  may  rewrite  for  convenience  as 

E[k]/ti=A<£iM+m- 

Consider  the  following  class  of  iterative  algorithms  parameterized  by  A 

^GEM  =  (^GEM  +  ^  •  (A.35) 

The  algorithm  corresponding  to  A  =  1  is  the  EM  algorithm  (A.31).  Substituting  Aq£M^ 
from  (A.35)  in  (A.34)  reveals  (A.35)  satisfies  (A.34)  if 

A(A  —  2)  >  0  . 

Thus  for  0  <  A  <  2  (A.35)  yields  a  GEM  algorithm.  The  algorithm  corresponding  to 
A  =  7r/2  is  of  particular  importance,  especially  in  the  case  M  —  2  where  feedback  is  present. 
In  fact,  it  is  the  optimal  A  when  Aml  ~  0  in  the  case  M  =  2  and  Wi  =  0;  the  case  Aml  =  0 
arises  when  fCi  (y)  =  1/2.  In  this  case,  the  convergence  rate  of  the  algorithm  is  given  by 

A(i+1)  -  Amt 

Jim  Agem  4.ML=1_2A/7r 
Asem  ~  Ail 

From  this  point  of  view,  the  algorithm  corresponding  to  A  =  tt/2  provides  the  optimal 
convergence  rate  when  Aml  is  close  to  the  quantizer  threshold.  Consequently,  it  should 
not  be  surprising  that  its  first  step  corresponds  to  the  algorithm  (2.58)  when  n  >  n01 
which  was  obtained  heuristically  and  which  achieves  the  optimal  information  loss  in  the 
Gaussian  scenario  for  M  =  2  in  the  context  of  feedback.  In  general,  the  GEM  algorithm 
with  A  =  7t/2  results  in  the  ML  estimate  in  fewer  iterations  than  the  EM  algorithm  for  any 
set  of  observations,  control  input  sequences,  and  noise  levels. 

The  corresponding  EM  algorithms  for  MAP  estimation  can  be  readily  derived  if  A  is 
random  with  a  priori  distribution  N{mA,(r\).  Specifically,  we  need  only  replace  (A.24) 
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with  [14] 


^EMI}  =  argmax  \u  (O;  igjj)  +  i(0  -  vaAf 
b  v  <?  a 


(A. 36) 


which  results  in  the  following  MAP  counterpart  of  (A.29) 


^EM  — 


m+%- 


(A.37) 


where  E[k]  is  given  by  (A.28)  and  (A.30).  MAP  counterparts  of  (A.32)  and  (A.33)  can  be 
also  readily  derived. 


A.6  Asymptotically  Efficient  Estimation  for  Pseudo-noise  Con¬ 
trol  Inputs 

In  this  appendix,  we  show  that  the  estimators  (2.45)-(2.47)  of  the  parameter  A  presented 
in  Section  2.3.1  are  asymptotically  efficient  with  respect  to  yN,  where  y[n]  is  given  by  (2.1) 
with  F(-)  given  by  (2.2),  and  where  a[n ]  in  (2.10)  is  an  IID  admissible  noise  process.  In 
the  absence  of  pseudo-noise,  a[n]  equals  v[n].  Consider  the  following  collection  of  binary 
sequences 


y,[n]  =  Fi(A  +  c*[ra])  =  sgn  (A  +  a[n]  -  Xi)  i  =  1,  2,  •  ■  • ,  M  -  1 


The  observed  output  y[n ]  is  equivalent  to  the  collection  yi[n],  •  ■  ■  ,?/A/_i[n],  since  y[n]  = 

Si  and  yi[n ]  =  sgn  (y[n]-V;).  The  ML  estimate  of  A  based  on  yf  =  jy{[l]  yi[2]  •  •  •  yt-[A']j 

is  given  by  A,-  in  (2.45).  We  have 


Ai  =Xa  (-C-1  (IS))  , 


(A.38) 


where 


t.-  =  c,  (yi')  =  5^E»W  +  5 
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The  estimators  we  develop  next  are  based  on  the  vector  A  defined  in  (2.46).  Note  that, 
although  the  collection  of  T,-’s  is  a  set  of  sufficient  statistics  for  the  problem,  A  is  not,  in 
general,  a  sufficient  statistic  due  to  the  limiting  operation  in  (A.38).  In  order  to  obtain  the 
distribution  of  A  for  large  N,  we  need  the  distribution  of  the  vector 

T=  [tx  T2  •••  TM-i]T  • 


For  convenience,  let  pi  =  CQ  (Aj  —  A)  and  /;  —  pa  (X,-  —  A) .  First  note  that  the  distribution 
of  the  vector  (y*)  KYi  (y*)  ---ICym  (y^)]  ^  is  multinomial,  and  approaches  a  Gaus¬ 
sian  vector  in  the  cumulative  sense  [12].  The  T;’s  are  linear  combinations  of  the  fZYi  (yN)% 
since  Tj  =  YljLi+i  ^Yj  {yN)  •  Consequently,  T  also  approaches  a  Gaussian  vector  in  the 
cumulative  sense,  i.e.,  T  ~  N(T,  Rt),  where 


i?T  =  R/N,  and 


R  = 


T  = 

Pi  (1  ~  Pi) 

P2  (1  -  Pi) 

PM -l  (1  ~  Pi) 


Pi  P2  •"  PM- 1 

P2  (1  -  Pi) 

P2  (1  —  P2)  '  • ' 

>M- 1  (1  ~  P2)  •  •  • 


PM- 1  (1  ~Pl) 
PM- 1  (1  ~  P2) 

PM- 1  (1  ~  PM- 1) 


In  a  manner  analogous  to  the  case  M  —  2  described  in  App.  A.5,  by  using  the  theo¬ 
rem  for  the  PDF  of  transformation  of  variables  [26],  and  in  the  limit  as  N  — >  00  (where 
we  invoke  the  law  of  large  numbers  and  ignore  the  boundary  effects  due  to  |A;|  <  A), 
we  have  A  ~  M{A  1,  C/N)  in  the  cumulative  sense,  where  C  =  F_1  R  F-1,  and  F  = 
diag(/i,  /2,  •  •  • ,  /m— 1)  •  It  can  be  readily  verified  that 


C 


-1 


Q\  bi  0  •  •  •  0 

hi  a2  62  *  •  : 

0  62  a3  ■ •  0 

:  •*  ••  bM- 2 

0  •  •  •  0  bw-2  o.M- 1 


(A.39) 
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where 


CLi  — 


p  p 

Jl  _|_  Ji 


and 


fi  fi+l 


Pi-1  ~  Pi  Pi  ~  Pi+1  Pi+1  ~  Pi 

If  C  where  available,  then  the  optimal  estimate  (in  terms  of  minimizing  the  MSE)  would  be 


A  =  ATA  =  (1  TC~1 1)-1  1  TC~1  A  , 


while  the  associated  MSE  would  satisfy 

^lirn^  N  E  [(A  -  A)2]  =  (lT C-1 1)'1  =  5  (A;  y) 

t'.e.,  this  estimator  would  be  asymptotically  efficient.  However,  C  is  a  function  of  the 
unknown  parameter  A.  Instead,  note  that  C(A{)  approaches  C(A)  for  large  N  for  any  i 
(since  A,-  is  asymptotically  efficient).  Specifically,  set  i  —  1  and  consider 


A  =  A(A!)tA  , 


where  A(0)  =  (l T C~l(6)  l)_1  1 T C~l(0).  Let  z  —  A  —  A,  z  —  z  -  A,  and  z  =  A  —  Al. 
Also,  let  AA  =  A(Ai)  -  A(A),  and  denote  by  AA;  the  ith  element  of  A  A.  Then, 


lim 

N—*oo 


NE 


(A  -  A)2; 


lim  N  E  [z2\  A] 

JV-400  1  J 


lim  N  E\(z  +  AAtz)2;  A] 

N-+  oo  L  J 

B(A-,  y)+lirnoNy£{Pi,j  +  Cij), 


(A.40) 


where  /3,-j  =  E  [AX{  AXj  Z{  zj],  and  Qj  =  E  [A A,-  Xj  Z{  Zj],  Note  that  in  App.  A.5  we  have 
shown  that  \imN-*oo  N  E  J^(Ai  —  A)2j  =  B  (A;  yi).  Since  pa  (•)  is  admissible,  for  large  N 
we  have 


A  A,  «  (Ai  -  A)  A' (A)  . 


Also,  since  Aj  —  A  is  Gaussian  for  large  N  (see  App.  A.5),  so  is  AA,.  In  addition,  there 
exists  G  >  |A((A)|  for  all  i,  which  implies  that  E  [AA2]  <  G/N,  and  E  [AA^]  <  ZG2/N2. 
There  also  exists  U  such  that  E  [z2\  <  U/N  for  all  i,  for  N  large  enough.  Finally,  let 


Amax  =  max;  A;  (.4).  Repeated  applications  of  the  Schwarz  inequality  yield 


i/4  ^  3 GU 


N2 


and 


Ifal  £  W  (B  [AA?]  B  [AAj]  E  [ij])I/4  <  , 

which,  when  substituted  in  (A.40),  result  in 


lim  Ne\(A-A)2;A]  <  B  (A;  y)  +  £  Jim  N  (1/3;,;!  +  |C<j|) 

N  -4oo  r  N—y oo 


3GC/  .  A maxZ3/4GU^2 


) 


<  B{M  y)  ■ 


(A.41) 


Since  A  is  asymptotically  unbiased  (as  a  sum  of  asymptotically  unbiased  estimates),  for  N 
large  we  have  E  j^A  —  A)2;  a|  >  B  (A;  y)  /N,  which  in  conjunction  with  (A.41)  yields  the 
desired  result  (2.48). 
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Appendix  B 


B.l  EM  Algorithm  for  Estimation  of  Gaussian  Noise  Param¬ 
eters  via  Signal  Quantizers 

In  this  appendix  we  present  the  derivation  of  an  EM  algorithm  that  can  be  used  to  obtain 
the  ML  estimator  of  A  and  cr  from  a  network  of  binary  quantizers.  The  eth  observation  m 
is  given  by 


Vi  =  Fi(xi)  =  Fi(A  +  crvi  +  Wi)  *  =  1,  —  ,  / 


(B.l) 


where  A  and  a  are  the  unknown  parameters  of  interest,  satisfying  |A|  <  A  and  a  <  a  <a. 
V{  is  an  IID  sequence  with  u,-  ~  A/"(0, 1),  wi  is  a  deterministic  (known)  sequence,  and  F,(-) 
is  the  ith  quantizer  and  is  given  by  (2.2).  We  use  X,(-)  and  X{(-)  to  denote  the  functions 
mapping  each  quantizer  level  Ym  of  the  ith  quantizer  jF,-(-)  to  the  associated  lower  and  upper 
thresholds  Xm-i  and  Xm,  respectively. 

We  select  as  the  complete  data  the  set  x,-  in  (B.l).  For  convenience  we  denote  by  x,  y 
the  following  vectors 


x  = 


Xi 


and 


y  = 


V  l 
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Let  9  denote  the  vector  of  parameters  that  we  wish  to  estimate,  i.e., 


9=  A  a  ■ 


The  EM  algorithm  then  selects  the  k  +  1-st  estimate  °f  ^  based  on  according  to 


where 


tt  ( a  n(k)  \ 

%M  =  argmax  U  [9-,  6>EM )  , 
9.  \ffi  |<a  v  ' 


U(9;  9(^)  =  E{\np(x;9)\y;  9^ 


We  have, 


A  I  A2  1  •J—, 

In  p(x;  9)  =  /i(x)  -  7  ln(<r)  +  ]T(a;;  -  to,-)  -  ~  ^(x«‘  “  •  (B-4) 

j=i  i=i 

Substituting  (B.4)  in  (B.3)  and  taking  the  derivative  of  U  (f)-,  0Em)  with  respect  to  A  and 
a,  yields  the  next  iteration  estimate  of  9 ,  namely 


i(*+i) 

aem 


=  Ta  {B[k]/I) 


—  '-£  (  n .  rr  \ 


gW  +  j[^m1)]2-2^1)BW 


(B.5a) 


(B.5b) 


where 


=  £?[(*,• -a?i)|y;fiS!i]  , 


G[k)  =  J2  Gi[k] , 


Gf[fc]  =  £7[(x,--ti;i)2|y;»S 


EM  1 
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and  where  X ^  ^  (•)  is  the  following  piecewise  linear  limiter  function: 


%  x)  (*)  =  { 


x  if  x  <  x  <  x 
x  if  x  <  z 

x  if  x  >  x 


(B.6) 


Letting 


(2i  -  ^EM  ~  f°\ 


a(*> 
EM  > 


(x- -  i<*>  -  W-}  !cr(k) 
\x*  AEM  Wt )  /aEM  ' 


and 


,i(‘)  _w. 

h  ~  S-i  +  ^EM  w'  > 


_  7F. 

zi 


=  Xi  +  -  Wi  , 


the  fc-th  samples  of  the  sequences  B[k]  and  <?,[&]  are  given  by 


■A*) 


B[k]  =  lAM+^L  £ 


r  exp  I  ~L2JL 


VrWl 


exp 


M 


Q  (w-fc))  -  Q  (u-fc)) 


(B.7) 


and 


GiW  =  ( 


(*) 


exp 


,  ,  i  (awY  i  UEM 

EMJ  +  {(Tem)  + 


Q^-Q^) 


»  (B.8) 


which  when  substituted  in  (B.5),  and  in  the  limit  k  -»•  oo  provide  AmlCy)  and  <7ML(y)- 
Note  that  when  z\k^  =  oo  in  (B.8)  (and  thus  also  vfj^  =  oo),  we  have 

exp  u\k)  /2^j  z\k)  -  Jlirn^  exp  (-u2/ 2)  (u<4m  +  2iJ^)  =  0  . 


Similarly,  when  zjk‘>  =  — oo,  we  have  exp  Juj^]  /2^  =  0. 
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Appendix  C 


C.l  Proof  of  Theorem  1 


In  this  section  we  prove  the  asymptotic  optimality  of  the  variable-rate  encoding  scheme 
described  in  Theorem  1.  Using  (4.11)  and  (4.9)  we  obtain 


(A[Nk+1]  -  A[iVfc+1])2]  <  DM-2* , 


(C.l) 


where  I?  is  a  constant  satisfying  D  >  A2.  Using  (C.l)  and  (4.10b)  reveals  that 


lim  Nk  E 

k—^oo 


(A[Nk+l]~A[Nk+l)y 


=  0, 


which  in  turn  implies  that 


lim 

fc— >oo 


e  (.4[Ay  -  a)2 

E[(i[j\y-A)2] 


Also,  due  to  (4.11)  and  (4.10a)  we  have 


lim 

k—t  oo 


E  (A[iVfc+a]-A)2 
E  (i[iVfc]-A)2 


(C.2) 


which  in  conjunction  with  (C.2)  implies  (4.3),  proving  the  asymptotic  optimality  of  the 
algorithm. 
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C.2  Proof  of  Theorem  2 


We  will  first  prove  the  following  lemma,  which  we  will  then  use  to  prove  Theorem  2. 
Lemma  1  Let 

s[n]  =  sgn  ( A  —  A[n])  . 

Then  the  following  statements  are  true  for  the  dynamical  system  (4-16): 

(a)  There  exist  arbitrarily  large  n  such  that 

£[n]  =  —  s[n  —  1]  .  (C.3) 

(b)  If  (C.3)  is  satisfied  for  some  n' ,  then 

|A  -  i[n]|  <  £  .  (C.4) 

(c)  The  set  of  initial  conditions  for  which  (C.3)  holds  for  all  n  satisfying  n  >  n'  for  some 
n'  has  zero  measure,  and  satisfy 

lim  \A  —  A[n]|2  n2  =  . 

n-*oo  4 

(d)  For  almost  all  initial  conditions  there  exist  arbitrarily  large  n  for  which 

s[n]  =  s[n  —  1]  =  —  s[n  —  2]  .  (C.5) 

Proof: 

(a)  To  show  (C.3)  is  satisfied  for  arbitrarily  large  n,  we  assume  the  opposite  is  true  and  arrive  at 
a  contradiction.  Assuming  there  is  an  n'  such  that  for  all  n  >  n'  s[n]  =  s[n  —  1],  and  repeated 
use  of  (4.16)  yield 

Alnb] 

k=n‘ 

which  must  be  true  for  all  n.  This,  however,  is  a  contradiction  since  Y)T=n'  1  is  not  bounded 
for  any  n'. 
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(b)  We  can  show  this  by  induction.  Due  to  (a)  there  exists  an  n  =  n'  4-  1  for  which  (C.3)  is 
true.  Since  A[n'  +  1]  and  A[n']  have  opposite  signs  and  satisfy  (4.16),  (C.4)  is  satisfied  for  n' . 
Assuming  (C.4)  holds  for  some  n>  n'  we  show  that  it  also  holds  for  n  +  1.  If  s[n]  =  —s[n  —  1] 
then  (C.4)  is  satisfied  for  n  +  1.  If  on  the  other  hand  s[n]  =  s[n  —  1],  then  since  (C.4)  holds 
for  n 


\A-A[n  +  l]\  =  \A-A[n]\--^-<^--^-<  ° 


n+ 1  n  n  +  1  n+1 


(c)  Let  us  assume  that  (C.3)  is  satisfied  for  all  n  >  n1,  where  n'  is  even.  Consider  the  sequence 


x[n]  =  \A  —  A[n]|  . 


Then,  for  all  n  >  n',  we  have  x[n]  >  0  and  also 


x[n]  = - x[n  —  1]  . 

n 


(C.6) 


Repeated  use  of  (C.6)  and  the  fact  that  x[n]  >  0  yields  a  relationship  that  must  be  satisfied 
for  all  even  n>  n' 


n/2  j  C  "/2  1 

(2  *)  <2  *  +  1)  <  <  «  -  (2i+l)(2i  +  2) 


Since  the  limits  (as  n  — »  oo)  of  both  the  upper  and  lower  bounds  on  x[n']  above  jure  equal,  we 
must  have 


1 


(C.T) 


Thus,  (C.3)  is  satisfied  for  all  n  >  n'  if  and  only  if  (C.7)  holds.  Finally,  since  (C.7)  also  holds 
for  all  even  n  >  n',  we  have 


fclim  (2ifc)2|A  — i[2fc]|2=  j 


The  proof  for  odd  n  is  similar. 

(d)  This  is  a  trivial  consequence  of  part  (c). 
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The  proof  of  Theorem  2  is  a  direct  consequence  of  the  above  lemma.  Specifically,  (4.17) 
is  trivially  implied  by  part  (b)  of  Lemma  1.  In  addition  (C.4)  implies  that 


Iimsup  n2  |A  —  A[n] |2  <  c2  . 

n— »-oo 

To  show  that  c2  is  indeed  the  upper  limit  we  employ  part  (d)  of  Lemma  1.  Use  of  condition 
(C.5)  reveals  that  there  exist  arbitrarily  large  n  for  which 

»|A-i[»]|>c(l-(-_1)^Z25)  . 

which  completes  the  proof. 


C.3  Asymptotic  Optimality  of  the  Digital  Encoding  and  Es¬ 
timation  Algorithms  of  Sections  4.3.1— 4.3.3 


In  this  appendix  we  show  the  asymptotic  optimality  of  the  sequential  encoder/estimator 
structures  of  the  form  (4.19)  presented  in  Sections  4.3.1-4.3.3. 

We  first  show  the  asymptotic  efficiency  of  the  fixed-rate  encoder/estimator  scheme  of 
Section  4.3.1  consisting  of  (4.12)  and  (4.19).  In  the  process  we  also  derive  the  relationship 
(4.25)  between  /3  and  A.  We  will  assume  that  (4.21)  holds  for  large  n  and  n  +  1,  and  find 
the  value  of  /?  as  n  — >  oo.  Since  the  resulting  value  is  neither  0  nor  oo  the  residual  error 
indeed  decays  as  1/n2.  By  exploiting 

^[ra  +  !]  =  "XT  “XT  «[«  +  !]  i 

n  +  1  n  +  1 


and  (4.19b)  we  have 


E 


(A[n  +  1]  -  i[n])2]  =  e[(^;  (A[n]  -  A[n  -  1])  + 


Y  (s[ra  +  !]  -  A[n  -  !])  - 
-sgn  (A[n]-A[n-  1])^ 


n  + 
A  a. 


(C.8) 


n 


For  convenience,  let  r?i  =  n  (A[7i]  —  A[n  -  1  ])/{n  +  1),  rj2  =  (s[n  +  1]  —  A[n  —  l])/{n  +  1), 
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and  %  =  A<7„sgn  (A[n]  —  A[n  —  1  ])/n.  Then,  from  (4.22) 


lim  n-  E  [r}\\  =  fieri. 


n— yoo 


(C.9a) 


and  also 


lim  n2  E  [773]  =  A 2<r2 . 


(C.9b) 


The  terms  of  the  form  E  (r\2  rji]  for  i  ^  2  decay  faster  than  1/n2,  while 


lim  n2  E  [rj2]  =  <r2v 


n-too 


(C.9c) 


The  term  in  (C.8)  corresponding  to  E  [r/i  rj 3]  reduces  to 

E[ViV3]  =  ~^jE[\A[n]-A[n-l]\^  .  (C.lOa) 

Using  (4.22)  and  the  Schwarz  inequality  reveals  that  E  jjA[n]  —  A[n  —  1]||  decays  as  1/n  or 
faster,  i.e., 


lim  n  E  [jA[n]  —  A[n  —  l]|j  =  C  (C.lOb) 

where  0  <  C  <  00  is  a  function  of  A,  as  is  /?.  By  multiplying  both  sides  of  (C.8)  with  n2, 
taking  the  limit  as  n  -»  00  and  by  using  (C.9)  and  (C.lOb)  we  obtain 

flul- /3crl+cl  +  X2(rl-2\(TvC  .  (C.ll) 

If  A  is  chosen  so  that  0  <  A  <  00,  using  (C.ll)  reveals  that  0  <  C  <  00,  which  verifies 
that  E  ||A[n]  -  A[n  -  1]|J  decays  as  1/n.  This  in  turn  implies  that  E  |(A[n]  —  A[n  —  l])2j 
cannot  decay  slower  than  1/n2  (i.e.,  fl  >  0).  In  fact,  for  large  n,  the  PDF  of  A[n]  -  A[n  —  1] 
is  well  approximated  by  a  Gaussian  PDF  in  the  cumulative  sense.  Consequently, 

C(P)  =  y/2fry/pov,  (C.12) 

which,  when  substituted  to  (C.ll),  results  in  (4.25). 
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The  accuracy  of  the  Gaussian  PDF  approximation  is  examined  in  Fig.  C-l,  where  we 
depict  the  results  of  Monte-Carlo  simulations  along  with  the  associated  predicted  quantities 
from  use  of  (4.25).  Although  this  figure  only  shows  the  validity  of  the  analysis  for  A  =  1, 
its  accuracy  is  remarkable  over  a  wide  range  of  A  values  that  have  been  tested. 

The  analysis  for  the  decoder/estimator  system  (4.12)-(4.19b)  applies  intact  in  the  case 
that  the  noise  is  nonGaussian  provided  that  v[n]  has  finite  variance,  thus  providing  the 
asymptotic  optimality  of  the  encoder  (4.12)  and  (4.19)  in  finite-variance  sensor  noises,  de¬ 
scribed  in  Section  4.3.2. 

To  show  that  the  associated  results  hold  for  an  encoder /estimator  of  the  form  (4.19) 
where  A[n ]  is  any  estimator  satisfying  (4.27)  and  where  the  noise  v[n]  is  admissible,  we 
again  write 


^A[n  +  1]  —  A[n]^  =  £  |  ^A[n]  -  A[n  -  1]  j  +  i[n  +  1]  -  sgn  (A[n]  —  A[n  -  1]^  j 


Similarly  to  the  analysis  of  sample-mean  based  encoder,  we  make  the  association,  771  = 
A[ri\  —  A[n  —  1],  772  =  e[n  +  1],  and  773  =  A  <rv  sgn  ^A[rc]  —  A[n  —  1]^  /n.  The  terms  E  [77^] , 
E  [77I],  E  [??!],  and  £'[771773]  are  given  by  (4.21),  (4.27),  (C.9b),  and  (C.10),  respectively. 
Finally,  we  can  easily  show  that  the  terms  £  [772  773]  and  £  [771 772]  also  decay  at  least  as 
fast  as  1/ra2  by  using  the  fact  that  £  [77/]  decays  as  1  /n2  and  the  Schwartz  inequality.  All 
these  together  imply  that  the  residual  error  decays  as  the  reciprocal  of  the  square  of  the 
observations.  By  ignoring  the  terms  £  [772  773]  and  £  [771 772]  we  obtain  an  estimate  for  the 
scaling  j3  of  the  residual  error  term  in  (4.29). 
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(a)  MSE  performance  of  A[n]  (solid)  and  Cramer-Rao  bound  B  (A;  s")  (dashed) 


(c)  E 


A[ra]  —  A[n  —  1] 


vs.  n 


Figure  C-l:  Validity  of  the  residual  error  analysis  for  A  =  1  for  Gaussian  t>[n].  The  solid  lines 
on  the  lower  two  figures  depict  the  results  of  Monte-Carlo  simulations.  The  dashed  curves 
correspond  to  the  associated  estimates  obtained  via  the  Gaussian  approximation,  leading 
to  the  value  of  /3  in  (4.25).  The  dotted  curve  on  the  top-left  figure  denotes  B  (A;  sn). 
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